|
|
--- |
|
|
license: mit |
|
|
language: en |
|
|
library_name: onnxruntime |
|
|
datasets: |
|
|
- ucirvine/sms_spam |
|
|
tags: |
|
|
- spam |
|
|
- sms |
|
|
- text-classification |
|
|
- onnx |
|
|
- quantized |
|
|
- edge-ai |
|
|
model-index: |
|
|
- name: distilroberta-sms-spam-detector-onnx-quantized |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Text Classification |
|
|
dataset: |
|
|
name: ucirvine/sms_spam (test split) |
|
|
type: ucirvine/sms_spam |
|
|
config: plain_text |
|
|
split: test |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 0.99 |
|
|
name: F1 (Weighted, Quantized) |
|
|
- type: accuracy |
|
|
value: 0.99 |
|
|
name: Accuracy (Quantized) |
|
|
--- |
|
|
|
|
|
# DistilRoBERTa for SMS Spam Detection (INT8 ONNX, Quantized) |
|
|
|
|
|
This repository contains a quantized, production-ready version of the `distilroberta-sms-spam-detector` model. The model has been converted to the ONNX format and its weights have been quantized to 8-bit integers (INT8) for optimal performance on edge devices like mobile phones. |
|
|
|
|
|
This optimization resulted in a **~4x reduction in file size** and significant improvements in inference speed, with only a marginal and acceptable decrease in accuracy. |
|
|
|
|
|
**This is the model intended for direct deployment in mobile applications.** |
|
|
|
|
|
The original, full-precision (FP32) model can be found at the [main model repository here](https://huggingface.co/SharpWoofer/distilroberta-sms-spam-detector) |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Model type:** Quantized ONNX graph of a fine-tuned `distilroberta-base` model. |
|
|
- **Intended Use:** On-device spam classification for mobile applications. |
|
|
- **Language(s):** English |
|
|
- **License:** MIT |
|
|
- **File Size:** ~79 MB |
|
|
|
|
|
This repository also contains a `version.txt` file for use with Over-the-Air (OTA) update systems. |
|
|
|
|
|
## How to Use (with ONNX Runtime) |
|
|
|
|
|
This model is designed to be used with `onnxruntime`. |
|
|
|
|
|
```python |
|
|
import onnxruntime as ort |
|
|
from transformers import AutoTokenizer |
|
|
import numpy as np |
|
|
import scipy.special |
|
|
|
|
|
REPO_ID = "SharpWoofer/distilroberta-sms-spam-detector-onnx-quantized" |
|
|
ONNX_MODEL_NAME = "model.quant.onnx" |
|
|
|
|
|
model_path = hf_hub_download(repo_id=REPO_ID, filename=ONNX_MODEL_NAME) |
|
|
|
|
|
# Load the tokenizer from the same repository |
|
|
tokenizer = AutoTokenizer.from_pretrained(REPO_ID) |
|
|
|
|
|
session = ort.InferenceSession(model_path) |
|
|
|
|
|
# Prepare text |
|
|
text = "Congratulations! You've won a $1000 gift card. Click now!" |
|
|
inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True) |
|
|
|
|
|
# Run inference |
|
|
outputs = session.run(None, dict(inputs)) |
|
|
scores = outputs[0][0] # Get the raw logits |
|
|
|
|
|
# Convert logits to probabilities |
|
|
probabilities = scipy.special.softmax(scores) |
|
|
prediction = np.argmax(probabilities) |
|
|
|
|
|
labels = ["HAM", "SPAM"] |
|
|
print(f"Prediction: {labels[prediction]}, Confidence: {probabilities[prediction]:.4f}") |
|
|
# >> Prediction: SPAM, Confidence: 0.99... |
|
|
``` |
|
|
|
|
|
## Quantization Procedure |
|
|
|
|
|
The original FP32 PyTorch model was first exported to ONNX format using the `optimum` library. Subsequently, dynamic quantization was applied using the `onnxruntime.quantization` toolkit to convert the model's weights to `INT8`. |
|
|
|
|
|
- **Library:** `onnxruntime` |
|
|
- **Method:** `quantize_dynamic` |
|
|
- **Weight Type:** `QuantType.QInt8` |
|
|
|
|
|
## Performance & Trade-offs |
|
|
|
|
|
The primary goal of quantization is to trade a small amount of precision for a large gain in efficiency. The evaluation below, conducted on the same 558-example test set, demonstrates the success of this trade-off. |
|
|
|
|
|
### **File Size:** |
|
|
- **Original (FP32):** ~313 MB |
|
|
- **Quantized (INT8):** ~79 MB (**3.96x smaller**) |
|
|
|
|
|
### **Accuracy Comparison:** |
|
|
|
|
|
| Model | Class | Precision | Recall | F1-Score | |
|
|
| :--- | :--- | :--- | :--- | :--- | |
|
|
| **Original (FP32)** | **HAM** | **1.00** | **1.00** | **1.00** | |
|
|
| | **SPAM**| **1.00** | **0.97** | **0.99** | |
|
|
| | Overall| 1.00 | 1.00 | 1.00 | |
|
|
| **Quantized (INT8)** | **HAM** | 0.99 | **1.00** | **1.00** | |
|
|
| | **SPAM**| **1.00** | 0.96 | 0.98 | |
|
|
| | Overall| 0.99 | 0.99 | 0.99 | |
|
|
|
|
|
As shown, the quantized model maintains perfect precision for SPAM detection and near-perfect precision for HAM, making it extremely reliable for on-device deployment. |