DeBERTa-v3-base
Domain-Adaptive Masked-Language Model for LONG-COVID Tweets
Base model microsoft/deberta-v3-base
Domain data 1.2 M English tweets that mention long-COVID (January 2020 → May 2025)
Objective Masked-Language-Modeling (MLM) for downstream COVID-19 NLP tasks
Trained with 🤗 Transformers 4.41 · 🤗 Datasets 2.21 · PyTorch 2.3 on 2 × NVIDIA H100 (192 GB)
| Split |
# examples |
Tokens / tweet (p50 / p95) |
| Train |
1 ,140 ,781 |
23 / 72 |
| Valid |
60 ,041 |
22 / 71 |
🏁 Final metrics (epoch 20)
| Metric |
Value |
| Validation loss |
1.9203 |
| Validation perplexity |
6.82 |
Settings
BASE_CHECKPOINT = "microsoft/deberta-v3-base"
MAX_SEQ_LENGTH = 128
MLM_PROB = 0.15
NUM_EPOCHS = 20
LEARNING_RATE = 5e-5
PER_DEVICE_BATCH_SIZE = 32
OUTPUT_DIR = "./pretrain_deberta_base_covid_with_val/"
LOGGING_STEPS = 500
SAVE_STEPS = 5_000
FP16 = True
VALIDATION_SPLIT_RATIO = 0.05
Intended use & limitations
| ✔️ Suitable for |
❌ Not suitable for |
| Domain-specific pre-training before fine-tuning on COVID text classification, NLI, QA, summarisation |
Providing medical advice or diagnosis |
| MLM-style feature extraction, embeddings, or continued pre-training |
Determining actual infection status |
| Research on domain adaptation & concept drift |
Use outside English-language tweets |
Important: Tweets are noisy and may include misinformation. Always fine-tune on a task-specific, quality-controlled dataset and manually inspect model outputs.
Training details
pip install -U transformers datasets accelerate==0.29
torchrun --nnodes 1 --nproc_per_node 2 pretrain_deberta_v3_base_covid.py \
--epochs 20 --lr 5e-5 --per_device_bs 32 --fp16 \
--csv_pattern ".../*.csv" \
--output_dir ".../pretrain_deberta_base_covid_with_val/"
Citation
@misc{dap-deberta-base-covid2025,
title = {DeBERTa-v3-base Domain-Adaptive Pre-Training on LONG-COVID Tweets},
author = {Kumar, S. and Contributors},
year = {2025},
howpublished = {\url{https://huggingface.co/your-org/deberta-v3-base-covid2025-mlm}}
}