DeBERTa-v3-base
Domain-Adaptive Masked-Language Model for LONG-COVID Tweets

Base model microsoft/deberta-v3-base
Domain data 1.2 M English tweets that mention long-COVID (January 2020 → May 2025)
Objective Masked-Language-Modeling (MLM) for downstream COVID-19 NLP tasks
Trained with 🤗 Transformers 4.41 · 🤗 Datasets 2.21 · PyTorch 2.3 on 2 × NVIDIA H100 (192 GB)

Split	# examples	Tokens / tweet ^{(p50 / p95)}
Train	1 ,140 ,781	23 / 72
Valid	60 ,041	22 / 71

🏁 Final metrics ^{(epoch 20)}

Metric	Value
Validation loss	1.9203
Validation perplexity	6.82

Settings

BASE_CHECKPOINT = "microsoft/deberta-v3-base"
MAX_SEQ_LENGTH = 128
MLM_PROB = 0.15
NUM_EPOCHS = 20
LEARNING_RATE = 5e-5
PER_DEVICE_BATCH_SIZE = 32
OUTPUT_DIR = "./pretrain_deberta_base_covid_with_val/"
LOGGING_STEPS = 500
SAVE_STEPS = 5_000
FP16 = True
VALIDATION_SPLIT_RATIO = 0.05

Intended use & limitations

✔️ Suitable for	❌ Not suitable for
Domain-specific pre-training before fine-tuning on COVID text classification, NLI, QA, summarisation	Providing medical advice or diagnosis
MLM-style feature extraction, embeddings, or continued pre-training	Determining actual infection status
Research on domain adaptation & concept drift	Use outside English-language tweets

Important: Tweets are noisy and may include misinformation. Always fine-tune on a task-specific, quality-controlled dataset and manually inspect model outputs.

Training details

pip install -U transformers datasets accelerate==0.29
torchrun --nnodes 1 --nproc_per_node 2 pretrain_deberta_v3_base_covid.py \
  --epochs 20 --lr 5e-5 --per_device_bs 32 --fp16 \
  --csv_pattern ".../*.csv" \
  --output_dir ".../pretrain_deberta_base_covid_with_val/"

Citation

@misc{dap-deberta-base-covid2025,
  title   = {DeBERTa-v3-base Domain-Adaptive Pre-Training on LONG-COVID Tweets},
  author  = {Kumar, S. and Contributors},
  year    = {2025},
  howpublished = {\url{https://huggingface.co/your-org/deberta-v3-base-covid2025-mlm}}
}

Downloads last month: -

Safetensors

Model size

0.2B params

Tensor type

F32