Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning
Abstract
Colon-X advances multimodal intelligence in colonoscopy by constructing comprehensive datasets and developing reasoning-centric models that outperform traditional methods under data scarcity.
In this study, we present Colon-X, an open initiative aimed at advancing multimodal intelligence in colonoscopy. We begin by constructing ColonVQA, the most comprehensive multimodal dataset ever built for colonoscopy, featuring over 1.1M+ visual question answering entries across 76 clinical findings and 18 multimodal tasks. Beyond serving as a community-wide data foundation, we further investigate a critical yet underexplored transition in colonoscopy - evolving from multimodal understanding to clinical reasoning: (a) To capture the current landscape of multimodal understanding behaviors, we systematically assess the generalizability of 22 multimodal large language models and examine their reliability under human-induced perturbations. The results reveal that clinical outputs from leading MLLMs remain far from robust and trustworthy. (b) To narrow this gap, we further explore reasoning-centric intelligence tailored for colonoscopy. Specifically, we curate ColonReason, a clinically grounded reasoning dataset annotated through a multi-expert debating pipeline, and develop ColonR1, the first R1-styled model incorporating task-adaptive rewarding and gradient-stable optimization techniques. Under data-scarce conditions, our ColonR1 achieves 56.61% overall accuracy, outperforming supervised fine-tuning by 25.22%, and sets a new reasoning-enabled baseline for multimodal colonoscopy analysis. All data and model resources are publicly available at https://github.com/ai4colonoscopy/Colon-X.
Community
Project COLON-X ๐ Pushing the neXt frontier in intelligent COLONoscopy
Colonoscopy saves lives โ but AI for colonoscopy is still far from intelligent. We are excited to launch the Colon-X project, an open initiative aimed at advancing multimodal intelligence in colonoscopy and beyond. Beyond serving as a community-wide data foundation, we're focused on a critical yet under-explored transition โ evolving from multimodal understanding to clinical reasoning.
- Paper: https://arxiv.org/abs/2512.03667
- Github: https://github.com/ai4colonoscopy/Colon-X
- Keywords: Multimodal Colonoscopy Analysis, Multimodal Understanding, Clinical Reasoning, Reinforcement Learning, Multimodal Benchmark, AI Healthcare, and Abdomen
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- GEMeX-RMCoT: An Enhanced Med-VQA Dataset for Region-Aware Multimodal Chain-of-Thought Reasoning (2025)
- Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning (2025)
- SpineBench: Benchmarking Multimodal LLMs for Spinal Pathology Analysis (2025)
- MedGEN-Bench: Contextually entangled benchmark for open-ended multimodal medical generation (2025)
- Think Twice to See More: Iterative Visual Reasoning in Medical VLMs (2025)
- OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning (2025)
- Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
