---
license: mit
language: en
library_name: onnxruntime
datasets:
- ucirvine/sms_spam
tags:
- spam
- sms
- text-classification
- onnx
- quantized
- edge-ai
model-index:
- name: distilroberta-sms-spam-detector-onnx-quantized
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      name: ucirvine/sms_spam (test split)
      type: ucirvine/sms_spam
      config: plain_text
      split: test
    metrics:
    - type: f1
      value: 0.99
      name: F1 (Weighted, Quantized)
    - type: accuracy
      value: 0.99
      name: Accuracy (Quantized)
---

# DistilRoBERTa for SMS Spam Detection (INT8 ONNX, Quantized)

This repository contains a quantized, production-ready version of the `distilroberta-sms-spam-detector` model. The model has been converted to the ONNX format and its weights have been quantized to 8-bit integers (INT8) for optimal performance on edge devices like mobile phones.

This optimization resulted in a **~4x reduction in file size** and significant improvements in inference speed, with only a marginal and acceptable decrease in accuracy.

**This is the model intended for direct deployment in mobile applications.**

The original, full-precision (FP32) model can be found at the [main model repository here](https://huggingface.co/SharpWoofer/distilroberta-sms-spam-detector)

## Model Description

- **Model type:** Quantized ONNX graph of a fine-tuned `distilroberta-base` model.
- **Intended Use:** On-device spam classification for mobile applications.
- **Language(s):** English
- **License:** MIT
- **File Size:** ~79 MB

This repository also contains a `version.txt` file for use with Over-the-Air (OTA) update systems.

## How to Use (with ONNX Runtime)

This model is designed to be used with `onnxruntime`.

```python
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
import scipy.special

REPO_ID = "SharpWoofer/distilroberta-sms-spam-detector-onnx-quantized"
ONNX_MODEL_NAME = "model.quant.onnx"

model_path = hf_hub_download(repo_id=REPO_ID, filename=ONNX_MODEL_NAME)

# Load the tokenizer from the same repository
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)

session = ort.InferenceSession(model_path)

# Prepare text
text = "Congratulations! You've won a $1000 gift card. Click now!"
inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True)

# Run inference
outputs = session.run(None, dict(inputs))
scores = outputs[0][0] # Get the raw logits

# Convert logits to probabilities
probabilities = scipy.special.softmax(scores)
prediction = np.argmax(probabilities)

labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[prediction]}, Confidence: {probabilities[prediction]:.4f}")
# >> Prediction: SPAM, Confidence: 0.99...
```

## Quantization Procedure

The original FP32 PyTorch model was first exported to ONNX format using the `optimum` library. Subsequently, dynamic quantization was applied using the `onnxruntime.quantization` toolkit to convert the model's weights to `INT8`.

- **Library:** `onnxruntime`
- **Method:** `quantize_dynamic`
- **Weight Type:** `QuantType.QInt8`

## Performance & Trade-offs

The primary goal of quantization is to trade a small amount of precision for a large gain in efficiency. The evaluation below, conducted on the same 558-example test set, demonstrates the success of this trade-off.

### **File Size:**
- **Original (FP32):** ~313 MB
- **Quantized (INT8):** ~79 MB (**3.96x smaller**)

### **Accuracy Comparison:**

| Model | Class | Precision | Recall | F1-Score |
| :--- | :--- | :--- | :--- | :--- |
| **Original (FP32)** | **HAM** | **1.00** | **1.00** | **1.00** |
| | **SPAM**| **1.00** | **0.97** | **0.99** |
| | Overall| 1.00 | 1.00 | 1.00 |
| **Quantized (INT8)** | **HAM** | 0.99 | **1.00** | **1.00** |
| | **SPAM**| **1.00** | 0.96 | 0.98 |
| | Overall| 0.99 | 0.99 | 0.99 |

As shown, the quantized model maintains perfect precision for SPAM detection and near-perfect precision for HAM, making it extremely reliable for on-device deployment.