--- license: cc-by-nc-4.0 language: - en pipeline_tag: image-feature-extraction library_name: timm metrics: - accuracy --- # Model Card for Digepath `Digepath` is a self-supervised foundation model for intelligent gastrointestinal pathology images analysis. Arxiv preprint paper: [https://arxiv.org/abs/2505.21928] The model is a Vision Transformer Large/16 with DINO-V2 [1] self-supervised pre-training on 353 million multi-scale images from 210,043 H&E-stained gastrointestinal related slides. ## Introduction of Digepath Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis suffers from limited reproducibility and diagnostic variability. To overcome these limitations, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterative optimization strategy combining pretraining with fine-screening, specifically designed to address the detection of sparsely distributed lesion areas in whole-slide images. Digepath was initially pretrained on a large-scale dataset comprising over _**353**_ million multi-scale images derived from _**210,043**_ H&E-stained slides of GI diseases. It was subsequently fine-tuned on _**471,443**_ carefully selected regions of interest (ROIs) in the second stage. It attains state-of-the-art performance on 32 out of 33 tasks related to GI pathology, including pathological diagnosis, protein expression status prediction, gene mutation prediction, and prognosis evaluation. _**Digepath**_ demonstrates broad applicability across diverse clinical tasks, highlighting its potential for reliable deployment in real-world pathology workflows. ![](https://huggingface.co/xtxx/Digepath/resolve/main/Digepath.png) ## Using Digepath to extract features from gastrointestinal pathology image ```python import timm import torch import torchvision.transforms as transforms model = timm.create_model('hf_hub:xtxx/Digepath', pretrained=True, init_values=1e-5, dynamic_img_size=True) preprocess = transforms.Compose([ transforms.Resize(224), transforms.ToTensor(), transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),]) model = model.to('cuda') model.eval() input = torch.randn([1, 3, 224, 224]).cuda() with torch.no_grad(): output = model(input) # [1, 1024] ``` ## Training Pipeline - Self Supervised Learning: https://github.com/facebookresearch/dinov2 ## Evaluation Pipeline - WSI Classification: https://github.com/lingxitong/MIL_BASELINE - ROI Classification: https://github.com/lingxitong/HistoROIBench - ROI Segmentation: https://github.com/lingxitong/PFM_Segmentation ## Citation If `Digepath` is helpful to you, please cite our work. ``` @article{zhu2025subspecialty, title={Subspecialty-specific foundation model for intelligent gastrointestinal pathology}, author={Zhu, Lianghui and Ling, Xitong and Ouyang, Minxi and Liu, Xiaoping and Guan, Tian and Fu, Mingxi and Cheng, Zhiqiang and Fu, Fanglei and Zeng, Maomao and Liu, Liming and others}, journal={arXiv preprint arXiv:2505.21928}, year={2025} } ``` ## References [1] Oquab, Maxime, et al. "Dinov2: Learning robust visual features without supervision." arXiv preprint arXiv:2304.07193 (2023).