---
license: cc-by-nc-4.0
language:
- en
pipeline_tag: image-feature-extraction
library_name: timm
metrics:
- accuracy
---
# Model Card for Digepath

<!-- Provide a quick summary of what the model is/does. -->

`Digepath` is a self-supervised foundation model for intelligent gastrointestinal pathology images analysis. Arxiv preprint paper: [https://arxiv.org/abs/2505.21928]

The model is a Vision Transformer Large/16 with DINO-V2 [1] self-supervised pre-training on 353 million multi-scale images from 210,043 H&E-stained gastrointestinal related slides.

## Introduction of Digepath

 Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis suffers from limited reproducibility and diagnostic variability. To overcome these limitations, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterative optimization strategy combining pretraining with fine-screening, specifically designed to address the detection of sparsely distributed lesion areas in whole-slide images. Digepath was initially pretrained on a large-scale dataset comprising over _**353**_ million multi-scale images derived from _**210,043**_ H&E-stained slides of GI diseases. It was subsequently fine-tuned on _**471,443**_ carefully selected regions of interest (ROIs) in the second stage. It attains state-of-the-art performance on 32 out of 33 tasks related to GI pathology, including pathological diagnosis, protein expression status prediction, gene mutation prediction, and prognosis evaluation. _**Digepath**_ demonstrates broad applicability across diverse clinical tasks, highlighting its potential for reliable deployment in real-world pathology workflows.

![](https://huggingface.co/xtxx/Digepath/resolve/main/Digepath.png)


## Using Digepath to extract features from gastrointestinal pathology image

```python
import timm
import torch
import torchvision.transforms as transforms

model = timm.create_model('hf_hub:xtxx/Digepath', pretrained=True, init_values=1e-5, dynamic_img_size=True)

preprocess = transforms.Compose([
            transforms.Resize(224),
            transforms.ToTensor(),
            transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),])

model = model.to('cuda')
model.eval()

input = torch.randn([1, 3, 224, 224]).cuda()

with torch.no_grad():
    output = model(input) # [1, 1024]
```

## Training Pipeline
- Self Supervised Learning: https://github.com/facebookresearch/dinov2

## Evaluation Pipeline
- WSI Classification: https://github.com/lingxitong/MIL_BASELINE
- ROI Classification: https://github.com/lingxitong/HistoROIBench
- ROI Segmentation: https://github.com/lingxitong/PFM_Segmentation

## Citation

If `Digepath` is helpful to you, please cite our work.

```
@article{zhu2025subspecialty,
  title={Subspecialty-specific foundation model for intelligent gastrointestinal pathology},
  author={Zhu, Lianghui and Ling, Xitong and Ouyang, Minxi and Liu, Xiaoping and Guan, Tian and Fu, Mingxi and Cheng, Zhiqiang and Fu, Fanglei and Zeng, Maomao and Liu, Liming and others},
  journal={arXiv preprint arXiv:2505.21928},
  year={2025}
}
```

## References

[1] Oquab, Maxime, et al. "Dinov2: Learning robust visual features without supervision." arXiv preprint arXiv:2304.07193 (2023).