--- language: - pl license: gpl-3.0 tags: - text-classification - emotion-classification - sentiment-analysis - polish - multi-label-classification - twitter datasets: - yazoniak/TwitterEmo-PL-Refined base_model: PKOBP/polish-roberta-8k metrics: - f1 - accuracy pipeline_tag: text-classification model-index: - name: twitter-emotion-pl-classifier results: - task: type: text-classification name: Multi-Label Emotion Classification dataset: type: yazoniak/TwitterEmo-PL-Refined name: TwitterEmo-PL-Refined split: validation metrics: - type: f1 value: 0.8500 name: F1 Macro verified: true args: average: macro - type: f1 value: 0.8900 name: F1 Micro verified: true args: average: micro - type: f1 value: 0.8895 name: F1 Weighted verified: true args: average: weighted - type: accuracy value: 0.5125 name: Exact Match Accuracy verified: true - type: accuracy value: 0.8900 name: Subset Accuracy verified: true --- # Polish Twitter Emotion Classifier (RoBERTa-8k) ## Model Description This model is a fine-tuned version of [PKOBP/polish-roberta-8k](https://huggingface.co/PKOBP/polish-roberta-8k) for multi-label emotion and sentiment classification in Polish. It was trained on the [TwitterEmo-PL-Refined](https://huggingface.co/datasets/yazoniak/TwitterEmo-PL-Refined) dataset. The model predicts 8 emotion and sentiment labels simultaneously: - **Emotions**: `radość` (joy), `wstręt` (disgust), `gniew` (anger), `przeczuwanie` (anticipation) - **Sentiment**: `pozytywny` (positive), `negatywny` (negative), `neutralny` (neutral) - **Special**: `sarkazm` (sarcasm) ### Model Details - **Model type**: RoBERTa (Polish) - **Language**: Polish - **Base model**: [PKOBP/polish-roberta-8k](https://huggingface.co/PKOBP/polish-roberta-8k) - **Task**: Multi-label text classification (emotion & sentiment) - **Training data**: 35,921 Polish tweets from TwitterEmo-PL-Refined - **License**: GPL-3.0 - **Context window**: 8,192 tokens (max; for tweet-length texts you can use a smaller tokenizer `max_length`, e.g., 256-1024) ## Intended Use ### Primary Use Cases - **Social media monitoring**: Analyze emotions and sentiment in Polish tweets and social media posts - **Customer feedback analysis**: Understand emotional responses in Polish customer reviews - **Research**: Study emotion expression patterns in Polish language social media - **Multi-label sentiment analysis**: Capture nuanced emotional states beyond binary positive/negative ### Out-of-Scope Use - This model is specifically trained on Polish Twitter data and may not generalize well to: - Formal Polish text (news articles, academic writing) - Other languages - Very long documents (optimal for tweet-length texts) ## Performance ### Overall Metrics | Metric | Score | |--------|-------| | **F1 Macro** | **0.8500** | | **F1 Micro** | **0.8900** | | **F1 Weighted** | **0.8895** | | **Exact Match Accuracy** | **0.5125** | | **Subset Accuracy** | **0.8900** | | **Validation Loss** | **0.2761** | ### Per-Label Performance | Label | F1 Score | Coverage | |-------|----------|----------| | **negatywny** (negative) | **0.8553** | 42.4% | | **neutralny** (neutral) | **0.8172** | 41.0% | | **pozytywny** (positive) | **0.7814** | 17.4% | | **gniew** (anger) | **0.7693** | 25.8% | | **radość** (joy) | **0.7476** | 11.9% | | **wstręt** (disgust) | **0.7337** | 20.4% | | **przeczuwanie** (anticipation) | **0.7220** | 21.6% | | **sarkazm** (sarcasm) | **0.5337** | 16.0% | ## Training Details ### Training Data The model was trained on [TwitterEmo-PL-Refined](https://huggingface.co/datasets/yazoniak/TwitterEmo-PL-Refined), which contains: - **Total samples**: 35,921 Polish tweets - **Label distribution**: - `negatywny`: 15,231 samples (42.4%) - `neutralny`: 14,720 samples (41.0%) - `gniew`: 9,252 samples (25.8%) - `przeczuwanie`: 7,776 samples (21.6%) - `wstręt`: 7,337 samples (20.4%) - `pozytywny`: 6,248 samples (17.4%) - `sarkazm`: 5,756 samples (16.0%) - `radość`: 4,283 samples (11.9%) ### Training Configuration ```python Model: PKOBP/polish-roberta-8k Training samples: 28,737 (80%) Validation samples: 7,184 (20%) Hyperparameters: - Learning rate: 1e-5 - Batch size: 32 (train), 32 (eval) - Epochs: 4 - Weight decay: 0.03 - Warmup ratio: 0.1 - Dropout rate: 0.2 - Max gradient norm: 1.0 - Optimizer: AdamW - LR scheduler: Cosine with warmup - Early stopping patience: 3 - Mixed precision: BF16 Training strategy: - Save strategy: Every 200 steps - Evaluation strategy: Every 200 steps - Best model selection: F1 Macro - Total training steps: 3,600 - Best checkpoint: 3,400 ``` ### Training Process Training was conducted on single NVIDIA RTX 3090 GPU using a stratified 80/20 train-validation split with the following progression: ![Training Progress](training_plots.png) ## Calibration The model's predictions can be improved using **temperature scaling** and **optimized thresholds**. Calibration analysis shows: ### Temperature Scaling Results Per-label temperature scaling reduces calibration error (Expected Calibration Error - ECE): | Label | Temperature | ECE Before | ECE After | Improvement | |-------|------------|------------|-----------|-------------| | `radość` | 1.066 | 0.0163 | 0.0166 | -1.8% | | `wstręt` | 1.117 | 0.0211 | 0.0152 | **+27.9%** | | `gniew` | 1.186 | 0.0308 | 0.0194 | **+37.0%** | | `przeczuwanie` | 1.102 | 0.0228 | 0.0237 | -3.9% | | `pozytywny` | 1.181 | 0.0280 | 0.0293 | -4.6% | | `negatywny` | 1.437 | 0.0594 | 0.0345 | **+41.9%** | | `neutralny` | 1.472 | 0.0696 | 0.0390 | **+44.0%** | | `sarkazm` | 1.078 | 0.0202 | 0.0202 | 0.0% | **Key findings:** - `neutralny`, `negatywny`, and `gniew` benefit most from temperature scaling - Some labels (`radość`, `przeczuwanie`, `pozytywny`) show minor degradation - Overall, calibration significantly improves probability reliability ### Optimized Decision Thresholds Per-label F1-optimized thresholds (vs. default 0.5): | Label | Optimal Threshold | F1 @ Optimal | F1 @ 0.5 | Improvement | |-------|------------------|--------------|----------|-------------| | `neutralny` | **0.330** | **0.8211** | 0.8110 | **+1.00%** | | `sarkazm` | **0.330** | **0.5766** | 0.5256 | **+5.10%** | | `przeczuwanie` | 0.410 | 0.7276 | 0.7187 | +0.89% | | `gniew` | 0.440 | 0.7692 | 0.7676 | +0.16% | | `negatywny` | 0.450 | 0.8516 | 0.8511 | +0.05% | | `wstręt` | 0.460 | 0.7477 | 0.7464 | +0.13% | | `pozytywny` | 0.510 | 0.7864 | 0.7859 | +0.04% | | `radość` | 0.560 | 0.7572 | 0.7558 | +0.14% | **Key findings:** - `sarkazm` shows the largest improvement (+5.10%) with a lower threshold (0.33) - `neutralny` also benefits significantly (+1.00%) from a lower threshold (0.33) - Most labels perform optimally near the default 0.5 threshold - Total improvement with optimized thresholds: **~0.5-1.0% F1 Macro** ### Calibration Files The model repository includes: - **Base model**: `model.safetensors` - Use with default threshold (0.5) - **Calibration artifacts**: `calibration_artifacts.json` - Contains temperature parameters and optimal thresholds ![Reliability diagrams*](calibration_reliability_diagrams.png) **Recommendation**: For production use, apply both temperature scaling and optimized thresholds for best performance. ## Model Files This repository contains: - **Model weights**: `model.safetensors` - Fine-tuned RoBERTa model - **Tokenizer**: `tokenizer.json`, `tokenizer_config.json` - Polish RoBERTa tokenizer - **Configuration**: `config.json` - Model configuration - **Calibration**: `calibration_artifacts.json` - Temperature scaling parameters and optimal thresholds - **Inference scripts**: - `predict.py` - Basic inference (threshold: 0.5) - `predict_calibrated.py` - Calibrated inference (recommended) - **Training artifacts**: `training_plots`, `calibration_reliability_diagrams` - **Requirements**: `requirements.txt` - Python dependencies - **License**: `LICENSE` - Full GPL-3.0 license text ### Installation ```bash pip install -r requirements.txt ``` Or install dependencies manually: ```bash pip install transformers torch numpy ``` ## Usage ### Important: Text Preprocessing **The model expects @mentions to be anonymized**, as they were during training. Both inference scripts automatically replace all `@username` mentions with `@anonymized_account` to match the training data distribution. ### Quick Start (Basic Inference) Use the `predict.py` script for basic inference with default threshold (0.5): ```bash # From Hugging Face (default) - mentions are automatically anonymized python predict.py "Uwielbiam czekać na peronie 3 godziny! Gratulacje dla #zgp" # Example with mentions python predict.py "@zgp_intervillage Uwielbiam czekać na peronie 3 godziny! Gratulacje dla #zgp" # Preprocessed internally: "@anonymized_account Uwielbiam czekać..." # From local model python predict.py "Uwielbiam czekać na peronie 3 godziny! Gratulacje dla #zgp" --model-path ./ # With custom threshold python predict.py "Uwielbiam czekać na peronie 3 godziny! Gratulacje dla #zgp" --model-path ./ --threshold 0.3 ``` **Example Output:** ``` Loading model from: yazoniak/twitter-emotion-pl-classifier Input text: Uwielbiam czekać na peronie 3 godziny! Gratulacje dla #zgp Assigned Labels: ---------------------------------------- radość pozytywny sarkazm All Labels (with probabilities): ---------------------------------------- ✓ radość : 0.9574 wstręt : 0.0566 gniew : 0.0516 przeczuwanie : 0.0347 ✓ pozytywny : 0.9782 negatywny : 0.0602 neutralny : 0.0336 ✓ sarkazm : 0.5404 ``` ### With Calibration Use the `predict_calibrated.py` script for calibrated inference with temperature scaling and optimized thresholds: ```bash # From Hugging Face with calibration (requires calibration_artifacts.json) python predict_calibrated.py "Uwielbiam czekać na peronie 3 godziny! Gratulacje dla #zgp" ``` ### Python API Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch import numpy as np import re def preprocess_text(text): """Preprocess text to match training data format.""" # Anonymize @mentions (IMPORTANT for best performance) text = re.sub(r'@\w+', '@anonymized_account', text) return text # Load model model_name = "yazoniak/twitter-emotion-pl-classifier" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) model.eval() # Get labels from model config labels = [model.config.id2label[i] for i in range(model.config.num_labels)] # Prepare input with preprocessing text = "@jan_kowalski To jest wspaniały dzień!" preprocessed_text = preprocess_text(text) # "@anonymized_account To jest wspaniały dzień!" inputs = tokenizer(preprocessed_text, return_tensors="pt", truncation=True, max_length=8192) # Inference with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits # Get probabilities probabilities = torch.sigmoid(logits).squeeze().numpy() # Apply threshold threshold = 0.5 predictions = { label: float(prob) for label, prob in zip(labels, probabilities) if prob > threshold } print(predictions) # Output: {'radość': 0.8734, 'pozytywny': 0.9156} ``` ### Interpretation The model outputs logits for each of the 8 labels. To get predictions: 1. **Without calibration**: Apply sigmoid, threshold at 0.5 1. **With calibration**: - Apply sigmoid - Apply temperature scaling (divide logits by temperature before sigmoid) - Apply per-label optimized thresholds ## Limitations and Biases ### Known Limitations 1. **Preprocessing required**: The model expects `@mentions` to be anonymized as `@anonymized_account` (matching training data). The provided inference scripts handle this automatically, but custom implementations must include this preprocessing step for optimal performance. 1. **Sarcasm detection**: The model struggles with Polish sarcasm (F1: 0.53), which is inherently difficult to detect in text for BERT models without additional context. 1. **Class imbalance**: Performance varies with label frequency: - High-frequency labels (`negatywny`, `neutralny`) perform best - Low-frequency labels (`radość`, `sarkazm`) show lower F1 scores 1. **Twitter-specific**: The model is optimized for tweet-length texts (up to 8,192 tokens) with informal language, hashtags, and mentions. ## Citation If you use this model in your research or applications, please cite: ```bibtex @model{yazoniak2025twitteremotionpl, title={Polish Twitter Emotion Classifier (RoBERTa-8k)}, author={yazoniak}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/yazoniak/twitter-emotion-pl-classifier} } ``` Also cite the base model and dataset: ```bibtex @dataset{yazoniak_twitteremo_pl_refined_2025, title = {TwitterEmo-PL-Refined: Polish Twitter Emotions (8 labels, refined)}, author = {yazoniak}, year = {2025}, url = {https://huggingface.co/datasets/yazoniak/TwitterEmo-PL-Refined} } @inproceedings{bogdanowicz2023twitteremo, title = {TwitterEmo: Annotating Emotions and Sentiment in Polish Twitter}, author = {Bogdanowicz, S. and Cwynar, H. and Zwierzchowska, A. and Klamra, C. and Kiera{\'s}, W. and Kobyli{\'n}ski, {\L}.}, booktitle = {Computational Science -- ICCS 2023}, series = {Lecture Notes in Computer Science}, volume = {14074}, publisher = {Springer, Cham}, year = {2023}, doi = {10.1007/978-3-031-36021-3_20} } ``` ## Acknowledgments - **Base model**: [PKOBP/polish-roberta-8k](https://huggingface.co/PKOBP/polish-roberta-8k) - **Original dataset**: [CLARIN-PL TwitterEmo](https://huggingface.co/datasets/clarin-pl/twitteremo) - **Label cleaning**: Cleanlab library for noise detection - **LLM assistance**: Gemini-2.5-Flash and GPT-4.1 for label review ## License ### License Terms This model is released under the **GNU General Public License v3.0 (GPL-3.0)**, inherited from the training dataset. **License Chain:** - **Base Model** ([PKOBP/polish-roberta-8k](https://huggingface.co/PKOBP/polish-roberta-8k)): Apache-2.0 - **Training Dataset** ([TwitterEmo-PL-Refined](https://huggingface.co/datasets/yazoniak/TwitterEmo-PL-Refined)): GPL-3.0 - **Original Dataset** ([clarin-pl/twitteremo](https://huggingface.co/datasets/clarin-pl/twitteremo)): GPL-3.0 - **This Fine-tuned Model**: **GPL-3.0** (inherited from training data) ### Full License Text The complete GPL-3.0 license text is available in the [LICENSE](LICENSE) file in this repository, or at: https://www.gnu.org/licenses/gpl-3.0.html ## Model Card Contact For questions, issues, or feedback about this model, please open an issue in the model repository or contact the author through Hugging Face. ______________________________________________________________________ **Model Version**: v1.0 **Last Updated**: 2025-10-10