BERT Spanish Sensationalism Classifier
Fine-tuned BERT model for detecting sensationalism in Spanish news articles.
The model performs binary text classification to determine whether a news item uses senstionalism techniques or not.
Model Details
Model Description
This model is a fine-tuned version of dccuchile/bert-base-spanish-wwm-cased, adapted for the task of sensationalism detection in Spanish news articles.
The classification is based on the combination of the news title and body text, allowing the model to capture both lexical and contextual cues commonly associated with sensationalist content.
- Developed by: Julen Neila
- Shared by: Julen Neila
- Model type: Transformer-based text classifier (BERT)
- Language(s): Spanish
- License: Apache 2.0
- Finetuned from model: dccuchile/bert-base-spanish-wwm-cased
Model Sources
- Base model: dccuchile/bert-base-spanish-wwm-cased
- Framework: Hugging Face Transformers
Uses
Direct Use
The model can be directly used to:
- Detect sensationalism in Spanish news headlines and articles
- Support media analysis and journalism studies
- Assist in content moderation and media monitoring pipelines
Downstream Use
The model can be integrated into:
- News aggregation systems
- Media bias and sensationalism analysis
- Academic NLP research projects
- Larger information extraction or classification pipelines
Out-of-Scope Use
The model is not recommended for:
- Social media posts or informal text
- Non-Spanish content
- Legal, medical, or high-stakes decision-making systems
Bias, Risks, and Limitations
- The model reflects biases present in the training data.
- It may underperform on very short texts or headlines without sufficient context.
- It may not generalize well to domains outside traditional digital journalism.
Recommendations
Users should be aware of these limitations and avoid deploying the model in high-impact decision-making contexts without additional validation.
How to Get Started with the Model
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="JJNeila/bert-spanish-sensationalism-oss",
tokenizer="JJNeila/bert-spanish-sensationalism-oss"
)
classifier("Estados Unidos entrena a 25.000 militares (1.400 españoles) para defender el este de Europa")
## Training Details
### Training Data
The model was trained on a curated dataset of Spanish news articles annotated for sensationalist presence.
- **Size:** ~3,163 labeled samples
- **Labels:**
- `0` → Non-sensationalism
- `1` → Sensationalism
The input format used during training was:
*title* + *[SEP]* + *text*
### Training Procedure
#### Preprocessing
- Removal of unlabeled samples
- Concatenation of title and article text
- Tokenization using the base BERT Spanish tokenizer
- Maximum sequence length: **512 tokens**
#### Training Hyperparameters
- **Training regime:** fp16 mixed precision
- **Optimizer:** AdamW
- **Learning rate:** 2e-5
- **Batch size:** 8
- **Epochs:** 3
- **Weight decay:** 0.01
- **Evaluation metric for model selection:** F1
#### Speeds, Sizes, Times
- **Training time:** ~0,5 hours
- **Hardware:** NVIDIA T4 GPU
- **Final model size:** ~440 MB
### Testing Data, Factors & Metrics
#### Testing Data
A held-out validation set (20%) stratified by class labels.
#### Metrics
The following metrics were used due to class imbalance considerations:
- Accuracy
- Precision
- Recall
- F1-score
### Results
| Metric | Value |
|-----------|-------|
| Accuracy | 0.84 |
| Precision | 0.84 |
| Recall | 0.83 |
| F1-score | 0.84 |
#### Summary
The model achieves a strong balance between precision and recall, making it particularly effective at identifying sensationalism content without excessive false positives.
---
## Environmental Impact
- **Hardware Type:** NVIDIA T4 GPU
- **Hours used:** ~0,5 hours
- **Cloud Provider:** Google Colab
- **Compute Region:** Europe
- **Carbon Emitted:** Not explicitly measured
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
## Technical Specifications
### Model Architecture and Objective
- **Architecture:** BERT-base (12 layers, ~110M parameters)
- **Objective:** Binary cross-entropy loss for text classification
#### Hardware
- NVIDIA T4 GPU (16 GB VRAM)
#### Software
- Python 3.12
- PyTorch
- Transformers
- Hugging Face Datasets
## Citation
**BibTeX:**
```bibtex
@misc{neila2026sensationalism,
title={BERT Spanish Sensationalism Classifier},
author={Neila, Julen},
year={2026},
publisher={Hugging Face}
}
## Model Card Authors
**Julen Neila**
## Model Card Contact
https://huggingface.co/JJNeila
- Downloads last month
- 121