SharpWoofer's picture
Update README.md
7f5d7a9 verified
---
license: mit
language: en
library_name: onnxruntime
datasets:
- ucirvine/sms_spam
tags:
- spam
- sms
- text-classification
- onnx
- quantized
- edge-ai
model-index:
- name: distilroberta-sms-spam-detector-onnx-quantized
results:
- task:
type: text-classification
name: Text Classification
dataset:
name: ucirvine/sms_spam (test split)
type: ucirvine/sms_spam
config: plain_text
split: test
metrics:
- type: f1
value: 0.99
name: F1 (Weighted, Quantized)
- type: accuracy
value: 0.99
name: Accuracy (Quantized)
---
# DistilRoBERTa for SMS Spam Detection (INT8 ONNX, Quantized)
This repository contains a quantized, production-ready version of the `distilroberta-sms-spam-detector` model. The model has been converted to the ONNX format and its weights have been quantized to 8-bit integers (INT8) for optimal performance on edge devices like mobile phones.
This optimization resulted in a **~4x reduction in file size** and significant improvements in inference speed, with only a marginal and acceptable decrease in accuracy.
**This is the model intended for direct deployment in mobile applications.**
The original, full-precision (FP32) model can be found at the [main model repository here](https://huggingface.co/SharpWoofer/distilroberta-sms-spam-detector)
## Model Description
- **Model type:** Quantized ONNX graph of a fine-tuned `distilroberta-base` model.
- **Intended Use:** On-device spam classification for mobile applications.
- **Language(s):** English
- **License:** MIT
- **File Size:** ~79 MB
This repository also contains a `version.txt` file for use with Over-the-Air (OTA) update systems.
## How to Use (with ONNX Runtime)
This model is designed to be used with `onnxruntime`.
```python
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
import scipy.special
REPO_ID = "SharpWoofer/distilroberta-sms-spam-detector-onnx-quantized"
ONNX_MODEL_NAME = "model.quant.onnx"
model_path = hf_hub_download(repo_id=REPO_ID, filename=ONNX_MODEL_NAME)
# Load the tokenizer from the same repository
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
session = ort.InferenceSession(model_path)
# Prepare text
text = "Congratulations! You've won a $1000 gift card. Click now!"
inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True)
# Run inference
outputs = session.run(None, dict(inputs))
scores = outputs[0][0] # Get the raw logits
# Convert logits to probabilities
probabilities = scipy.special.softmax(scores)
prediction = np.argmax(probabilities)
labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[prediction]}, Confidence: {probabilities[prediction]:.4f}")
# >> Prediction: SPAM, Confidence: 0.99...
```
## Quantization Procedure
The original FP32 PyTorch model was first exported to ONNX format using the `optimum` library. Subsequently, dynamic quantization was applied using the `onnxruntime.quantization` toolkit to convert the model's weights to `INT8`.
- **Library:** `onnxruntime`
- **Method:** `quantize_dynamic`
- **Weight Type:** `QuantType.QInt8`
## Performance & Trade-offs
The primary goal of quantization is to trade a small amount of precision for a large gain in efficiency. The evaluation below, conducted on the same 558-example test set, demonstrates the success of this trade-off.
### **File Size:**
- **Original (FP32):** ~313 MB
- **Quantized (INT8):** ~79 MB (**3.96x smaller**)
### **Accuracy Comparison:**
| Model | Class | Precision | Recall | F1-Score |
| :--- | :--- | :--- | :--- | :--- |
| **Original (FP32)** | **HAM** | **1.00** | **1.00** | **1.00** |
| | **SPAM**| **1.00** | **0.97** | **0.99** |
| | Overall| 1.00 | 1.00 | 1.00 |
| **Quantized (INT8)** | **HAM** | 0.99 | **1.00** | **1.00** |
| | **SPAM**| **1.00** | 0.96 | 0.98 |
| | Overall| 0.99 | 0.99 | 0.99 |
As shown, the quantized model maintains perfect precision for SPAM detection and near-perfect precision for HAM, making it extremely reliable for on-device deployment.