MedGemma 4B โ CRC Tissue Classification (Fine-tuned)
Fine-tuned version of MedGemma 4B for 9-class colorectal cancer tissue classification from H&E stained histological images.
Model Description
This model was fine-tuned using LoRA on the NCT-CRC-HE-100K dataset to classify colorectal tissue patches into 9 classes:
| Label | Tissue Type |
|---|---|
| ADI | Adipose |
| BACK | Background |
| DEB | Debris |
| LYM | Lymphocytes |
| MUC | Mucus |
| MUS | Smooth Muscle |
| NORM | Normal Colon Mucosa |
| STR | Cancer-Associated Stroma |
| TUM | Colorectal Adenocarcinoma Epithelium |
Training Details
| Parameter | Value |
|---|---|
| Base model | unsloth/medgemma-4b-it (full 16-bit) |
| Fine-tuning method | LoRA (r=16, alpha=16, all-linear) |
| Trainable parameters | 38,497,792 / 4,338,577,264 (0.89%) |
| Training samples | 9,000 from NCT-CRC-HE-100K |
| Steps | 300 (epoch = 0.267) |
| Batch size | 4 per device, grad accum 2 (effective = 8) |
| Learning rate | 2e-4 with cosine scheduler |
| Optimizer | AdamW (fused) |
| Hardware | NVIDIA A100-SXM4-80GB |
| Training time | ~21 minutes |
| Final training loss | 0.2072 |
Evaluation Results
Evaluated on 500 randomly sampled images (seed=42) from each dataset:
| Model | Dataset | Accuracy | Weighted F1 |
|---|---|---|---|
| Pretrained MedGemma | NCT-CRC-HE-100K (test) | 19.4% | 0.163 |
| Fine-tuned (Ours) | NCT-CRC-HE-100K (test) | 94.2% | 0.943 |
| Pretrained MedGemma | CRC-VAL-HE-7K | 17.4% | 0.147 |
| Fine-tuned (Ours) | CRC-VAL-HE-7K | 94.8% | 0.948 |
How to Use
from unsloth import FastVisionModel
from peft import PeftModel
import torch
# Load base model + adapter
model, processor = FastVisionModel.from_pretrained(
"unsloth/medgemma-4b-it",
load_in_4bit=False,
)
model = PeftModel.from_pretrained(model, "Bimokuncoro/medgemma-4b-crc-finetuned")
FastVisionModel.for_inference(model)
# Run inference
from PIL import Image
image = Image.open("your_tissue_image.png")
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "What type of tissue is shown in this histological image?
Reply with ONLY one word from this list: ADI, BACK, DEB, LYM, MUC, MUS, NORM, STR, TUM.
Do not explain."}
]
}]
input_text = processor.tokenizer.apply_chat_template(
messages, add_generation_prompt=True, tokenize=False
)
inputs = processor(images=image, text=input_text,
add_special_tokens=False, return_tensors="pt").to("cuda")
input_length = inputs["input_ids"].shape[1]
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=50, use_cache=True)
new_tokens = out[0][input_length:]
result = processor.tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
print(result) # e.g. "TUM"
Dataset
- Training: 1aurent/NCT-CRC-HE,
split
NCT_CRC_HE_100K, 9,000 samples - Evaluation A: Same dataset, held-out test split, 500 samples
- Evaluation B: Split
CRC_VAL_HE_7K, 500 samples (never seen during training)
Citation / Acknowledgements
- Base model: Google MedGemma
- Fine-tuning framework: Unsloth
- Dataset: NCT-CRC-HE by 1aurent
- Downloads last month
- 28
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support