Update README.md

7f5d7a9 verified 2 months ago

4.06 kB

	---
	license: mit
	language: en
	library_name: onnxruntime
	datasets:
	- ucirvine/sms_spam
	tags:
	- spam
	- sms
	- text-classification
	- onnx
	- quantized
	- edge-ai
	model-index:
	- name: distilroberta-sms-spam-detector-onnx-quantized
	results:
	- task:
	type: text-classification
	name: Text Classification
	dataset:
	name: ucirvine/sms_spam (test split)
	type: ucirvine/sms_spam
	config: plain_text
	split: test
	metrics:
	- type: f1
	value: 0.99
	name: F1 (Weighted, Quantized)
	- type: accuracy
	value: 0.99
	name: Accuracy (Quantized)
	---

	# DistilRoBERTa for SMS Spam Detection (INT8 ONNX, Quantized)

	This repository contains a quantized, production-ready version of the `distilroberta-sms-spam-detector` model. The model has been converted to the ONNX format and its weights have been quantized to 8-bit integers (INT8) for optimal performance on edge devices like mobile phones.

	This optimization resulted in a ~4x reduction in file size and significant improvements in inference speed, with only a marginal and acceptable decrease in accuracy.

	This is the model intended for direct deployment in mobile applications.

	The original, full-precision (FP32) model can be found at the [main model repository here](https://huggingface.co/SharpWoofer/distilroberta-sms-spam-detector)

	## Model Description

	- Model type: Quantized ONNX graph of a fine-tuned `distilroberta-base` model.
	- Intended Use: On-device spam classification for mobile applications.
	- Language(s): English
	- License: MIT
	- File Size: ~79 MB

	This repository also contains a `version.txt` file for use with Over-the-Air (OTA) update systems.

	## How to Use (with ONNX Runtime)

	This model is designed to be used with `onnxruntime`.

	```python
	import onnxruntime as ort
	from transformers import AutoTokenizer
	import numpy as np
	import scipy.special

	REPO_ID = "SharpWoofer/distilroberta-sms-spam-detector-onnx-quantized"
	ONNX_MODEL_NAME = "model.quant.onnx"

	model_path = hf_hub_download(repo_id=REPO_ID, filename=ONNX_MODEL_NAME)

	# Load the tokenizer from the same repository
	tokenizer = AutoTokenizer.from_pretrained(REPO_ID)

	session = ort.InferenceSession(model_path)

	# Prepare text
	text = "Congratulations! You've won a $1000 gift card. Click now!"
	inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True)

	# Run inference
	outputs = session.run(None, dict(inputs))
	scores = outputs[0][0] # Get the raw logits

	# Convert logits to probabilities
	probabilities = scipy.special.softmax(scores)
	prediction = np.argmax(probabilities)

	labels = ["HAM", "SPAM"]
	print(f"Prediction: {labels[prediction]}, Confidence: {probabilities[prediction]:.4f}")
	# >> Prediction: SPAM, Confidence: 0.99...
	```

	## Quantization Procedure

	The original FP32 PyTorch model was first exported to ONNX format using the `optimum` library. Subsequently, dynamic quantization was applied using the `onnxruntime.quantization` toolkit to convert the model's weights to `INT8`.

	- Library: `onnxruntime`
	- Method: `quantize_dynamic`
	- Weight Type: `QuantType.QInt8`

	## Performance & Trade-offs

	The primary goal of quantization is to trade a small amount of precision for a large gain in efficiency. The evaluation below, conducted on the same 558-example test set, demonstrates the success of this trade-off.

	### File Size:
	- Original (FP32): ~313 MB
	- Quantized (INT8): ~79 MB (3.96x smaller)

	### Accuracy Comparison:

	\| Model \| Class \| Precision \| Recall \| F1-Score \|
	\| :--- \| :--- \| :--- \| :--- \| :--- \|
	\| Original (FP32) \| HAM \| 1.00 \| 1.00 \| 1.00 \|
	\| \| SPAM\| 1.00 \| 0.97 \| 0.99 \|
	\| \| Overall\| 1.00 \| 1.00 \| 1.00 \|
	\| Quantized (INT8) \| HAM \| 0.99 \| 1.00 \| 1.00 \|
	\| \| SPAM\| 1.00 \| 0.96 \| 0.98 \|
	\| \| Overall\| 0.99 \| 0.99 \| 0.99 \|

	As shown, the quantized model maintains perfect precision for SPAM detection and near-perfect precision for HAM, making it extremely reliable for on-device deployment.