--- license: mit language: en library_name: onnxruntime datasets: - ucirvine/sms_spam tags: - spam - sms - text-classification - onnx - quantized - edge-ai model-index: - name: distilroberta-sms-spam-detector-onnx-quantized results: - task: type: text-classification name: Text Classification dataset: name: ucirvine/sms_spam (test split) type: ucirvine/sms_spam config: plain_text split: test metrics: - type: f1 value: 0.99 name: F1 (Weighted, Quantized) - type: accuracy value: 0.99 name: Accuracy (Quantized) --- # DistilRoBERTa for SMS Spam Detection (INT8 ONNX, Quantized) This repository contains a quantized, production-ready version of the `distilroberta-sms-spam-detector` model. The model has been converted to the ONNX format and its weights have been quantized to 8-bit integers (INT8) for optimal performance on edge devices like mobile phones. This optimization resulted in a **~4x reduction in file size** and significant improvements in inference speed, with only a marginal and acceptable decrease in accuracy. **This is the model intended for direct deployment in mobile applications.** The original, full-precision (FP32) model can be found at the [main model repository here](https://huggingface.co/SharpWoofer/distilroberta-sms-spam-detector) ## Model Description - **Model type:** Quantized ONNX graph of a fine-tuned `distilroberta-base` model. - **Intended Use:** On-device spam classification for mobile applications. - **Language(s):** English - **License:** MIT - **File Size:** ~79 MB This repository also contains a `version.txt` file for use with Over-the-Air (OTA) update systems. ## How to Use (with ONNX Runtime) This model is designed to be used with `onnxruntime`. ```python import onnxruntime as ort from transformers import AutoTokenizer import numpy as np import scipy.special REPO_ID = "SharpWoofer/distilroberta-sms-spam-detector-onnx-quantized" ONNX_MODEL_NAME = "model.quant.onnx" model_path = hf_hub_download(repo_id=REPO_ID, filename=ONNX_MODEL_NAME) # Load the tokenizer from the same repository tokenizer = AutoTokenizer.from_pretrained(REPO_ID) session = ort.InferenceSession(model_path) # Prepare text text = "Congratulations! You've won a $1000 gift card. Click now!" inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True) # Run inference outputs = session.run(None, dict(inputs)) scores = outputs[0][0] # Get the raw logits # Convert logits to probabilities probabilities = scipy.special.softmax(scores) prediction = np.argmax(probabilities) labels = ["HAM", "SPAM"] print(f"Prediction: {labels[prediction]}, Confidence: {probabilities[prediction]:.4f}") # >> Prediction: SPAM, Confidence: 0.99... ``` ## Quantization Procedure The original FP32 PyTorch model was first exported to ONNX format using the `optimum` library. Subsequently, dynamic quantization was applied using the `onnxruntime.quantization` toolkit to convert the model's weights to `INT8`. - **Library:** `onnxruntime` - **Method:** `quantize_dynamic` - **Weight Type:** `QuantType.QInt8` ## Performance & Trade-offs The primary goal of quantization is to trade a small amount of precision for a large gain in efficiency. The evaluation below, conducted on the same 558-example test set, demonstrates the success of this trade-off. ### **File Size:** - **Original (FP32):** ~313 MB - **Quantized (INT8):** ~79 MB (**3.96x smaller**) ### **Accuracy Comparison:** | Model | Class | Precision | Recall | F1-Score | | :--- | :--- | :--- | :--- | :--- | | **Original (FP32)** | **HAM** | **1.00** | **1.00** | **1.00** | | | **SPAM**| **1.00** | **0.97** | **0.99** | | | Overall| 1.00 | 1.00 | 1.00 | | **Quantized (INT8)** | **HAM** | 0.99 | **1.00** | **1.00** | | | **SPAM**| **1.00** | 0.96 | 0.98 | | | Overall| 0.99 | 0.99 | 0.99 | As shown, the quantized model maintains perfect precision for SPAM detection and near-perfect precision for HAM, making it extremely reliable for on-device deployment.