SharpWoofer's picture
Update README.md
7f5d7a9 verified
metadata
license: mit
language: en
library_name: onnxruntime
datasets:
  - ucirvine/sms_spam
tags:
  - spam
  - sms
  - text-classification
  - onnx
  - quantized
  - edge-ai
model-index:
  - name: distilroberta-sms-spam-detector-onnx-quantized
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: ucirvine/sms_spam (test split)
          type: ucirvine/sms_spam
          config: plain_text
          split: test
        metrics:
          - type: f1
            value: 0.99
            name: F1 (Weighted, Quantized)
          - type: accuracy
            value: 0.99
            name: Accuracy (Quantized)

DistilRoBERTa for SMS Spam Detection (INT8 ONNX, Quantized)

This repository contains a quantized, production-ready version of the distilroberta-sms-spam-detector model. The model has been converted to the ONNX format and its weights have been quantized to 8-bit integers (INT8) for optimal performance on edge devices like mobile phones.

This optimization resulted in a ~4x reduction in file size and significant improvements in inference speed, with only a marginal and acceptable decrease in accuracy.

This is the model intended for direct deployment in mobile applications.

The original, full-precision (FP32) model can be found at the main model repository here

Model Description

  • Model type: Quantized ONNX graph of a fine-tuned distilroberta-base model.
  • Intended Use: On-device spam classification for mobile applications.
  • Language(s): English
  • License: MIT
  • File Size: ~79 MB

This repository also contains a version.txt file for use with Over-the-Air (OTA) update systems.

How to Use (with ONNX Runtime)

This model is designed to be used with onnxruntime.

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
import scipy.special

REPO_ID = "SharpWoofer/distilroberta-sms-spam-detector-onnx-quantized"
ONNX_MODEL_NAME = "model.quant.onnx"

model_path = hf_hub_download(repo_id=REPO_ID, filename=ONNX_MODEL_NAME)

# Load the tokenizer from the same repository
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)

session = ort.InferenceSession(model_path)

# Prepare text
text = "Congratulations! You've won a $1000 gift card. Click now!"
inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True)

# Run inference
outputs = session.run(None, dict(inputs))
scores = outputs[0][0] # Get the raw logits

# Convert logits to probabilities
probabilities = scipy.special.softmax(scores)
prediction = np.argmax(probabilities)

labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[prediction]}, Confidence: {probabilities[prediction]:.4f}")
# >> Prediction: SPAM, Confidence: 0.99...

Quantization Procedure

The original FP32 PyTorch model was first exported to ONNX format using the optimum library. Subsequently, dynamic quantization was applied using the onnxruntime.quantization toolkit to convert the model's weights to INT8.

  • Library: onnxruntime
  • Method: quantize_dynamic
  • Weight Type: QuantType.QInt8

Performance & Trade-offs

The primary goal of quantization is to trade a small amount of precision for a large gain in efficiency. The evaluation below, conducted on the same 558-example test set, demonstrates the success of this trade-off.

File Size:

  • Original (FP32): ~313 MB
  • Quantized (INT8): ~79 MB (3.96x smaller)

Accuracy Comparison:

Model Class Precision Recall F1-Score
Original (FP32) HAM 1.00 1.00 1.00
SPAM 1.00 0.97 0.99
Overall 1.00 1.00 1.00
Quantized (INT8) HAM 0.99 1.00 1.00
SPAM 1.00 0.96 0.98
Overall 0.99 0.99 0.99

As shown, the quantized model maintains perfect precision for SPAM detection and near-perfect precision for HAM, making it extremely reliable for on-device deployment.