DeBERTa-v3-base
Domain-Adaptive Masked-Language Model for LONG-COVID Tweets

Base modelmicrosoft/deberta-v3-base
Domain data 1.2 M English tweets that mention long-COVID (January 2020 → May 2025)
Objective Masked-Language-Modeling (MLM) for downstream COVID-19 NLP tasks
Trained with 🤗 Transformers 4.41 · 🤗 Datasets 2.21 · PyTorch 2.3 on 2 × NVIDIA H100 (192 GB)

Split # examples Tokens / tweet (p50 / p95)
Train 1 ,140 ,781 23 / 72
Valid 60 ,041 22 / 71

🏁 Final metrics (epoch 20)

Metric Value
Validation loss 1.9203
Validation perplexity 6.82

Settings

  1. BASE_CHECKPOINT = "microsoft/deberta-v3-base"
  2. MAX_SEQ_LENGTH = 128
  3. MLM_PROB = 0.15
  4. NUM_EPOCHS = 20
  5. LEARNING_RATE = 5e-5
  6. PER_DEVICE_BATCH_SIZE = 32
  7. OUTPUT_DIR = "./pretrain_deberta_base_covid_with_val/"
  8. LOGGING_STEPS = 500
  9. SAVE_STEPS = 5_000
  10. FP16 = True
  11. VALIDATION_SPLIT_RATIO = 0.05

Intended use & limitations

✔️ Suitable for ❌ Not suitable for
Domain-specific pre-training before fine-tuning on COVID text classification, NLI, QA, summarisation Providing medical advice or diagnosis
MLM-style feature extraction, embeddings, or continued pre-training Determining actual infection status
Research on domain adaptation & concept drift Use outside English-language tweets

Important: Tweets are noisy and may include misinformation. Always fine-tune on a task-specific, quality-controlled dataset and manually inspect model outputs.


Training details

pip install -U transformers datasets accelerate==0.29
torchrun --nnodes 1 --nproc_per_node 2 pretrain_deberta_v3_base_covid.py \
  --epochs 20 --lr 5e-5 --per_device_bs 32 --fp16 \
  --csv_pattern ".../*.csv" \
  --output_dir ".../pretrain_deberta_base_covid_with_val/"

Citation

@misc{dap-deberta-base-covid2025,
  title   = {DeBERTa-v3-base Domain-Adaptive Pre-Training on LONG-COVID Tweets},
  author  = {Kumar, S. and Contributors},
  year    = {2025},
  howpublished = {\url{https://huggingface.co/your-org/deberta-v3-base-covid2025-mlm}}
}
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support