---
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
license: mit
---

## A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?

- Code: [DLILP](https://github.com/jusiro/DLILP)
- Paper: [IPMI 2025](https://link.springer.com/chapter/10.1007/978-3-031-96625-5_20) - [ArXiv](https://arxiv.org/abs/2504.05227)
- Docs: [Documentation](https://github.com/jusiro/DLILP)
- Tutorial: [Notebook](https://colab.research.google.com/drive/1_8Ysd8mCKuLX_Q86e-7pOAHFbSR9F4aZ?usp=sharing)

### About "CONVIRT" weights:

- Pre-trained using a vanilla CLIP contrastive loss - a very similar pre-training as earlier proposed in [CONVIRT](https://arxiv.org/abs/2010.00747) paper (2020).
- Pre-trained on MIMIC.

If you find this repository useful, please consider citing this paper:
```
@inproceedings{convirt,
    author = {Yuhao Zhang and others},
    booktitle = {MHLC},
    pages = {1-24},
    title = {Contrastive Learning of Medical Visual Representations from Paired Images and Text},
    year = {2022},
}

@inproceedings{dlilp,
    title={A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?},
    author={Julio Silva-Rodríguez and Jose Dolz and Ismail {Ben Ayed}},
    booktitle={Information Processing in Medical Imaging (IPMI)},
    year={2025}
}
```