|
|
--- |
|
|
language: |
|
|
- pl |
|
|
license: gpl-3.0 |
|
|
tags: |
|
|
- text-classification |
|
|
- emotion-classification |
|
|
- sentiment-analysis |
|
|
- polish |
|
|
- multi-label-classification |
|
|
- twitter |
|
|
datasets: |
|
|
- yazoniak/TwitterEmo-PL-Refined |
|
|
base_model: PKOBP/polish-roberta-8k |
|
|
metrics: |
|
|
- f1 |
|
|
- accuracy |
|
|
pipeline_tag: text-classification |
|
|
model-index: |
|
|
- name: twitter-emotion-pl-classifier |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Multi-Label Emotion Classification |
|
|
dataset: |
|
|
type: yazoniak/TwitterEmo-PL-Refined |
|
|
name: TwitterEmo-PL-Refined |
|
|
split: validation |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 0.8500 |
|
|
name: F1 Macro |
|
|
verified: true |
|
|
args: |
|
|
average: macro |
|
|
- type: f1 |
|
|
value: 0.8900 |
|
|
name: F1 Micro |
|
|
verified: true |
|
|
args: |
|
|
average: micro |
|
|
- type: f1 |
|
|
value: 0.8895 |
|
|
name: F1 Weighted |
|
|
verified: true |
|
|
args: |
|
|
average: weighted |
|
|
- type: accuracy |
|
|
value: 0.5125 |
|
|
name: Exact Match Accuracy |
|
|
verified: true |
|
|
- type: accuracy |
|
|
value: 0.8900 |
|
|
name: Subset Accuracy |
|
|
verified: true |
|
|
--- |
|
|
|
|
|
# Polish Twitter Emotion Classifier (RoBERTa-8k) |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a fine-tuned version of [PKOBP/polish-roberta-8k](https://huggingface.co/PKOBP/polish-roberta-8k) for multi-label emotion and sentiment classification in Polish. It was trained on the [TwitterEmo-PL-Refined](https://huggingface.co/datasets/yazoniak/TwitterEmo-PL-Refined) dataset. |
|
|
|
|
|
The model predicts 8 emotion and sentiment labels simultaneously: |
|
|
|
|
|
- **Emotions**: `radość` (joy), `wstręt` (disgust), `gniew` (anger), `przeczuwanie` (anticipation) |
|
|
- **Sentiment**: `pozytywny` (positive), `negatywny` (negative), `neutralny` (neutral) |
|
|
- **Special**: `sarkazm` (sarcasm) |
|
|
|
|
|
### Model Details |
|
|
|
|
|
- **Model type**: RoBERTa (Polish) |
|
|
- **Language**: Polish |
|
|
- **Base model**: [PKOBP/polish-roberta-8k](https://huggingface.co/PKOBP/polish-roberta-8k) |
|
|
- **Task**: Multi-label text classification (emotion & sentiment) |
|
|
- **Training data**: 35,921 Polish tweets from TwitterEmo-PL-Refined |
|
|
- **License**: GPL-3.0 |
|
|
- **Context window**: 8,192 tokens (max; for tweet-length texts you can use a smaller tokenizer `max_length`, e.g., 256-1024) |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
### Primary Use Cases |
|
|
|
|
|
- **Social media monitoring**: Analyze emotions and sentiment in Polish tweets and social media posts |
|
|
- **Customer feedback analysis**: Understand emotional responses in Polish customer reviews |
|
|
- **Research**: Study emotion expression patterns in Polish language social media |
|
|
- **Multi-label sentiment analysis**: Capture nuanced emotional states beyond binary positive/negative |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
- This model is specifically trained on Polish Twitter data and may not generalize well to: |
|
|
- Formal Polish text (news articles, academic writing) |
|
|
- Other languages |
|
|
- Very long documents (optimal for tweet-length texts) |
|
|
|
|
|
## Performance |
|
|
|
|
|
### Overall Metrics |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|-------| |
|
|
| **F1 Macro** | **0.8500** | |
|
|
| **F1 Micro** | **0.8900** | |
|
|
| **F1 Weighted** | **0.8895** | |
|
|
| **Exact Match Accuracy** | **0.5125** | |
|
|
| **Subset Accuracy** | **0.8900** | |
|
|
| **Validation Loss** | **0.2761** | |
|
|
|
|
|
### Per-Label Performance |
|
|
|
|
|
| Label | F1 Score | Coverage | |
|
|
|-------|----------|----------| |
|
|
| **negatywny** (negative) | **0.8553** | 42.4% | |
|
|
| **neutralny** (neutral) | **0.8172** | 41.0% | |
|
|
| **pozytywny** (positive) | **0.7814** | 17.4% | |
|
|
| **gniew** (anger) | **0.7693** | 25.8% | |
|
|
| **radość** (joy) | **0.7476** | 11.9% | |
|
|
| **wstręt** (disgust) | **0.7337** | 20.4% | |
|
|
| **przeczuwanie** (anticipation) | **0.7220** | 21.6% | |
|
|
| **sarkazm** (sarcasm) | **0.5337** | 16.0% | |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was trained on [TwitterEmo-PL-Refined](https://huggingface.co/datasets/yazoniak/TwitterEmo-PL-Refined), which contains: |
|
|
|
|
|
- **Total samples**: 35,921 Polish tweets |
|
|
- **Label distribution**: |
|
|
- `negatywny`: 15,231 samples (42.4%) |
|
|
- `neutralny`: 14,720 samples (41.0%) |
|
|
- `gniew`: 9,252 samples (25.8%) |
|
|
- `przeczuwanie`: 7,776 samples (21.6%) |
|
|
- `wstręt`: 7,337 samples (20.4%) |
|
|
- `pozytywny`: 6,248 samples (17.4%) |
|
|
- `sarkazm`: 5,756 samples (16.0%) |
|
|
- `radość`: 4,283 samples (11.9%) |
|
|
|
|
|
### Training Configuration |
|
|
|
|
|
```python |
|
|
Model: PKOBP/polish-roberta-8k |
|
|
Training samples: 28,737 (80%) |
|
|
Validation samples: 7,184 (20%) |
|
|
|
|
|
Hyperparameters: |
|
|
- Learning rate: 1e-5 |
|
|
- Batch size: 32 (train), 32 (eval) |
|
|
- Epochs: 4 |
|
|
- Weight decay: 0.03 |
|
|
- Warmup ratio: 0.1 |
|
|
- Dropout rate: 0.2 |
|
|
- Max gradient norm: 1.0 |
|
|
- Optimizer: AdamW |
|
|
- LR scheduler: Cosine with warmup |
|
|
- Early stopping patience: 3 |
|
|
- Mixed precision: BF16 |
|
|
|
|
|
Training strategy: |
|
|
- Save strategy: Every 200 steps |
|
|
- Evaluation strategy: Every 200 steps |
|
|
- Best model selection: F1 Macro |
|
|
- Total training steps: 3,600 |
|
|
- Best checkpoint: 3,400 |
|
|
``` |
|
|
|
|
|
### Training Process |
|
|
|
|
|
Training was conducted on single NVIDIA RTX 3090 GPU using a stratified 80/20 train-validation split with the following progression: |
|
|
|
|
|
 |
|
|
|
|
|
## Calibration |
|
|
|
|
|
The model's predictions can be improved using **temperature scaling** and **optimized thresholds**. Calibration analysis shows: |
|
|
|
|
|
### Temperature Scaling Results |
|
|
|
|
|
Per-label temperature scaling reduces calibration error (Expected Calibration Error - ECE): |
|
|
|
|
|
| Label | Temperature | ECE Before | ECE After | Improvement | |
|
|
|-------|------------|------------|-----------|-------------| |
|
|
| `radość` | 1.066 | 0.0163 | 0.0166 | -1.8% | |
|
|
| `wstręt` | 1.117 | 0.0211 | 0.0152 | **+27.9%** | |
|
|
| `gniew` | 1.186 | 0.0308 | 0.0194 | **+37.0%** | |
|
|
| `przeczuwanie` | 1.102 | 0.0228 | 0.0237 | -3.9% | |
|
|
| `pozytywny` | 1.181 | 0.0280 | 0.0293 | -4.6% | |
|
|
| `negatywny` | 1.437 | 0.0594 | 0.0345 | **+41.9%** | |
|
|
| `neutralny` | 1.472 | 0.0696 | 0.0390 | **+44.0%** | |
|
|
| `sarkazm` | 1.078 | 0.0202 | 0.0202 | 0.0% | |
|
|
|
|
|
**Key findings:** |
|
|
|
|
|
- `neutralny`, `negatywny`, and `gniew` benefit most from temperature scaling |
|
|
- Some labels (`radość`, `przeczuwanie`, `pozytywny`) show minor degradation |
|
|
- Overall, calibration significantly improves probability reliability |
|
|
|
|
|
### Optimized Decision Thresholds |
|
|
|
|
|
Per-label F1-optimized thresholds (vs. default 0.5): |
|
|
|
|
|
| Label | Optimal Threshold | F1 @ Optimal | F1 @ 0.5 | Improvement | |
|
|
|-------|------------------|--------------|----------|-------------| |
|
|
| `neutralny` | **0.330** | **0.8211** | 0.8110 | **+1.00%** | |
|
|
| `sarkazm` | **0.330** | **0.5766** | 0.5256 | **+5.10%** | |
|
|
| `przeczuwanie` | 0.410 | 0.7276 | 0.7187 | +0.89% | |
|
|
| `gniew` | 0.440 | 0.7692 | 0.7676 | +0.16% | |
|
|
| `negatywny` | 0.450 | 0.8516 | 0.8511 | +0.05% | |
|
|
| `wstręt` | 0.460 | 0.7477 | 0.7464 | +0.13% | |
|
|
| `pozytywny` | 0.510 | 0.7864 | 0.7859 | +0.04% | |
|
|
| `radość` | 0.560 | 0.7572 | 0.7558 | +0.14% | |
|
|
|
|
|
**Key findings:** |
|
|
|
|
|
- `sarkazm` shows the largest improvement (+5.10%) with a lower threshold (0.33) |
|
|
- `neutralny` also benefits significantly (+1.00%) from a lower threshold (0.33) |
|
|
- Most labels perform optimally near the default 0.5 threshold |
|
|
- Total improvement with optimized thresholds: **~0.5-1.0% F1 Macro** |
|
|
|
|
|
### Calibration Files |
|
|
|
|
|
The model repository includes: |
|
|
|
|
|
- **Base model**: `model.safetensors` - Use with default threshold (0.5) |
|
|
- **Calibration artifacts**: `calibration_artifacts.json` - Contains temperature parameters and optimal thresholds |
|
|
|
|
|
 |
|
|
|
|
|
**Recommendation**: For production use, apply both temperature scaling and optimized thresholds for best performance. |
|
|
|
|
|
## Model Files |
|
|
|
|
|
This repository contains: |
|
|
|
|
|
- **Model weights**: `model.safetensors` - Fine-tuned RoBERTa model |
|
|
- **Tokenizer**: `tokenizer.json`, `tokenizer_config.json` - Polish RoBERTa tokenizer |
|
|
- **Configuration**: `config.json` - Model configuration |
|
|
- **Calibration**: `calibration_artifacts.json` - Temperature scaling parameters and optimal thresholds |
|
|
- **Inference scripts**: |
|
|
- `predict.py` - Basic inference (threshold: 0.5) |
|
|
- `predict_calibrated.py` - Calibrated inference (recommended) |
|
|
- **Training artifacts**: `training_plots`, `calibration_reliability_diagrams` |
|
|
- **Requirements**: `requirements.txt` - Python dependencies |
|
|
- **License**: `LICENSE` - Full GPL-3.0 license text |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
Or install dependencies manually: |
|
|
|
|
|
```bash |
|
|
pip install transformers torch numpy |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Important: Text Preprocessing |
|
|
|
|
|
**The model expects @mentions to be anonymized**, as they were during training. Both inference scripts automatically replace all `@username` mentions with `@anonymized_account` to match the training data distribution. |
|
|
|
|
|
### Quick Start (Basic Inference) |
|
|
|
|
|
Use the `predict.py` script for basic inference with default threshold (0.5): |
|
|
|
|
|
```bash |
|
|
# From Hugging Face (default) - mentions are automatically anonymized |
|
|
python predict.py "Uwielbiam czekać na peronie 3 godziny! Gratulacje dla #zgp" |
|
|
|
|
|
# Example with mentions |
|
|
python predict.py "@zgp_intervillage Uwielbiam czekać na peronie 3 godziny! Gratulacje dla #zgp" |
|
|
# Preprocessed internally: "@anonymized_account Uwielbiam czekać..." |
|
|
|
|
|
# From local model |
|
|
python predict.py "Uwielbiam czekać na peronie 3 godziny! Gratulacje dla #zgp" --model-path ./ |
|
|
|
|
|
# With custom threshold |
|
|
python predict.py "Uwielbiam czekać na peronie 3 godziny! Gratulacje dla #zgp" --model-path ./ --threshold 0.3 |
|
|
``` |
|
|
|
|
|
**Example Output:** |
|
|
|
|
|
``` |
|
|
Loading model from: yazoniak/twitter-emotion-pl-classifier |
|
|
|
|
|
Input text: Uwielbiam czekać na peronie 3 godziny! Gratulacje dla #zgp |
|
|
|
|
|
Assigned Labels: |
|
|
---------------------------------------- |
|
|
radość |
|
|
pozytywny |
|
|
sarkazm |
|
|
|
|
|
All Labels (with probabilities): |
|
|
---------------------------------------- |
|
|
✓ radość : 0.9574 |
|
|
wstręt : 0.0566 |
|
|
gniew : 0.0516 |
|
|
przeczuwanie : 0.0347 |
|
|
✓ pozytywny : 0.9782 |
|
|
negatywny : 0.0602 |
|
|
neutralny : 0.0336 |
|
|
✓ sarkazm : 0.5404 |
|
|
``` |
|
|
|
|
|
### With Calibration |
|
|
|
|
|
Use the `predict_calibrated.py` script for calibrated inference with temperature scaling and optimized thresholds: |
|
|
|
|
|
```bash |
|
|
# From Hugging Face with calibration (requires calibration_artifacts.json) |
|
|
python predict_calibrated.py "Uwielbiam czekać na peronie 3 godziny! Gratulacje dla #zgp" |
|
|
``` |
|
|
|
|
|
### Python API Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
import numpy as np |
|
|
import re |
|
|
|
|
|
def preprocess_text(text): |
|
|
"""Preprocess text to match training data format.""" |
|
|
# Anonymize @mentions (IMPORTANT for best performance) |
|
|
text = re.sub(r'@\w+', '@anonymized_account', text) |
|
|
return text |
|
|
|
|
|
# Load model |
|
|
model_name = "yazoniak/twitter-emotion-pl-classifier" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
model.eval() |
|
|
|
|
|
# Get labels from model config |
|
|
labels = [model.config.id2label[i] for i in range(model.config.num_labels)] |
|
|
|
|
|
# Prepare input with preprocessing |
|
|
text = "@jan_kowalski To jest wspaniały dzień!" |
|
|
preprocessed_text = preprocess_text(text) # "@anonymized_account To jest wspaniały dzień!" |
|
|
inputs = tokenizer(preprocessed_text, return_tensors="pt", truncation=True, max_length=8192) |
|
|
|
|
|
# Inference |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
logits = outputs.logits |
|
|
|
|
|
# Get probabilities |
|
|
probabilities = torch.sigmoid(logits).squeeze().numpy() |
|
|
|
|
|
# Apply threshold |
|
|
threshold = 0.5 |
|
|
predictions = { |
|
|
label: float(prob) |
|
|
for label, prob in zip(labels, probabilities) |
|
|
if prob > threshold |
|
|
} |
|
|
|
|
|
print(predictions) |
|
|
# Output: {'radość': 0.8734, 'pozytywny': 0.9156} |
|
|
``` |
|
|
|
|
|
### Interpretation |
|
|
|
|
|
The model outputs logits for each of the 8 labels. To get predictions: |
|
|
|
|
|
1. **Without calibration**: Apply sigmoid, threshold at 0.5 |
|
|
1. **With calibration**: |
|
|
- Apply sigmoid |
|
|
- Apply temperature scaling (divide logits by temperature before sigmoid) |
|
|
- Apply per-label optimized thresholds |
|
|
|
|
|
## Limitations and Biases |
|
|
|
|
|
### Known Limitations |
|
|
|
|
|
1. **Preprocessing required**: The model expects `@mentions` to be anonymized as `@anonymized_account` (matching training data). The provided inference scripts handle this automatically, but custom implementations must include this preprocessing step for optimal performance. |
|
|
|
|
|
1. **Sarcasm detection**: The model struggles with Polish sarcasm (F1: 0.53), which is inherently difficult to detect in text for BERT models without additional context. |
|
|
|
|
|
1. **Class imbalance**: Performance varies with label frequency: |
|
|
|
|
|
- High-frequency labels (`negatywny`, `neutralny`) perform best |
|
|
- Low-frequency labels (`radość`, `sarkazm`) show lower F1 scores |
|
|
|
|
|
1. **Twitter-specific**: The model is optimized for tweet-length texts (up to 8,192 tokens) with informal language, hashtags, and mentions. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research or applications, please cite: |
|
|
|
|
|
```bibtex |
|
|
@model{yazoniak2025twitteremotionpl, |
|
|
title={Polish Twitter Emotion Classifier (RoBERTa-8k)}, |
|
|
author={yazoniak}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
url={https://huggingface.co/yazoniak/twitter-emotion-pl-classifier} |
|
|
} |
|
|
``` |
|
|
|
|
|
Also cite the base model and dataset: |
|
|
|
|
|
```bibtex |
|
|
@dataset{yazoniak_twitteremo_pl_refined_2025, |
|
|
title = {TwitterEmo-PL-Refined: Polish Twitter Emotions (8 labels, refined)}, |
|
|
author = {yazoniak}, |
|
|
year = {2025}, |
|
|
url = {https://huggingface.co/datasets/yazoniak/TwitterEmo-PL-Refined} |
|
|
} |
|
|
|
|
|
@inproceedings{bogdanowicz2023twitteremo, |
|
|
title = {TwitterEmo: Annotating Emotions and Sentiment in Polish Twitter}, |
|
|
author = {Bogdanowicz, S. and Cwynar, H. and Zwierzchowska, A. and Klamra, C. and Kiera{\'s}, W. and Kobyli{\'n}ski, {\L}.}, |
|
|
booktitle = {Computational Science -- ICCS 2023}, |
|
|
series = {Lecture Notes in Computer Science}, |
|
|
volume = {14074}, |
|
|
publisher = {Springer, Cham}, |
|
|
year = {2023}, |
|
|
doi = {10.1007/978-3-031-36021-3_20} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- **Base model**: [PKOBP/polish-roberta-8k](https://huggingface.co/PKOBP/polish-roberta-8k) |
|
|
- **Original dataset**: [CLARIN-PL TwitterEmo](https://huggingface.co/datasets/clarin-pl/twitteremo) |
|
|
- **Label cleaning**: Cleanlab library for noise detection |
|
|
- **LLM assistance**: Gemini-2.5-Flash and GPT-4.1 for label review |
|
|
|
|
|
## License |
|
|
|
|
|
### License Terms |
|
|
|
|
|
This model is released under the **GNU General Public License v3.0 (GPL-3.0)**, inherited from the training dataset. |
|
|
|
|
|
**License Chain:** |
|
|
|
|
|
- **Base Model** ([PKOBP/polish-roberta-8k](https://huggingface.co/PKOBP/polish-roberta-8k)): Apache-2.0 |
|
|
- **Training Dataset** ([TwitterEmo-PL-Refined](https://huggingface.co/datasets/yazoniak/TwitterEmo-PL-Refined)): GPL-3.0 |
|
|
- **Original Dataset** ([clarin-pl/twitteremo](https://huggingface.co/datasets/clarin-pl/twitteremo)): GPL-3.0 |
|
|
- **This Fine-tuned Model**: **GPL-3.0** (inherited from training data) |
|
|
|
|
|
### Full License Text |
|
|
|
|
|
The complete GPL-3.0 license text is available in the [LICENSE](LICENSE) file in this repository, or at: https://www.gnu.org/licenses/gpl-3.0.html |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
For questions, issues, or feedback about this model, please open an issue in the model repository or contact the author through Hugging Face. |
|
|
|
|
|
______________________________________________________________________ |
|
|
|
|
|
**Model Version**: v1.0 |
|
|
**Last Updated**: 2025-10-10 |
|
|
|