π©» MIMIC-CXR Radiology Report Generator
This repository provides a PyTorch encoderβdecoder model for automated radiology report generation using chest X-ray images and structured medical knowledge.
The model is designed to generate clinically coherent free-text radiology reports by jointly reasoning over:
- Dual-view chest X-ray images (frontal + lateral)
- Knowledge graph priors encoding structured medical relationships
This work is intended for academic research and educational purposes.
π The codebase for building the model is hosted in the following gitub repository:
https://github.com/shemzegem200/MIMIC-CXR-Medical-Report-Generation-Using-Deep-Learning
π Additional resources such as pickle file, adjacency matrix, model, grouped dataframe are in the following drive link: https://drive.google.com/drive/folders/1DThVS3wzvL9EbdtBQjFHEWIO8ilqLrvz?usp=sharing
1. Overview
Radiology report generation is a challenging problem that requires the integration of:
- Accurate visual understanding of medical images
- Robust clinical language modeling
- Anatomical and pathological consistency
This project proposes a hybrid architecture that integrates:
- Visual features extracted from dual-view chest X-rays
- Structured medical knowledge encoded using a Graph Convolutional Network (GCN)
- Sequential text generation using an LSTM-based decoder
The model is trained and evaluated on the MIMIC-CXR dataset.
2. Model Architecture
2.1 Visual Encoder
- Backbone: ResNet-18
- Input: Two chest X-ray images (frontal + lateral)
- Output: Fixed-length image embeddings
Each image is processed independently, and the resulting embeddings are combined downstream.
2.2 Knowledge Graph Encoder
- Graph representation of medical entities
- Encoder: Graph Convolutional Network (GCN)
- Node features derived from textual embeddings
- Output: Global knowledge embedding via mean pooling
This component provides structured clinical priors that guide report generation.
2.3 Multimodal Fusion
The image embeddings and the knowledge graph embedding are concatenated and projected into a unified latent space:
[Image_1 | Image_2 | Knowledge_Graph] -> Linear_Projection -> Shared_Embedding
2.4 Report Decoder
- Decoder: LSTM-based RNN
- Input: Multimodal embedding + tokenized report prefix
- Output: Token-level probability distribution over the vocabulary
Text generation uses nucleus (top-p) sampling with minimum-length constraints.
Architecture Diagram
The following diagram illustrates the overall architecture of the proposed radiology report generation framework, highlighting the integration of visual features, knowledge graph priors, and sequential text generation.
3. Repository Structure
.
βββ model.py # Encoder, GCN, and decoder definitions
βββ inference.py # Command-line inference & evaluation script
βββ utils.py # Dataset, collate_fn, KG utilities
βββ config.json # Model hyperparameters and paths
βββ final_model.pth # Trained model weights
βββ requirements.txt # Python dependencies
β
βββ assets/
β βββ adjacency_matrix.csv # Knowledge graph structure
β
βββ notebooks/
β βββ model_building.ipynb # Training and experimentation notebook
β βββ knowledge_graph_construction.ipynb # Knowledge graph construction notebook
β
βββ README.md
4. Dataset Notice β οΈ
This repository does NOT include the MIMIC-CXR dataset.
- MIMIC-CXR is subject to PhysioNet credentialed access
- Redistribution of the dataset is prohibited
To obtain the dataset, visit:
https://physionet.org/content/mimic-cxr/
All training and evaluation code assumes that the user has legally obtained access to the dataset.
5. Installation
5.1 Using Conda (Recommended)
conda create -n mrg python=3.10
conda activate mrg
5.2 Without Conda (Using venv)
python -m venv mrg_env
source mrg_env/bin/activate # Linux / macOS
mrg_env\Scripts\activate # Windows
5.3 Install Dependencies
Using requirements.txt:
pip install -r requirements.txt
6. Configuration
Model and inference parameters are defined in config.json, including:
- Embedding dimensions
- Vocabulary size
- Knowledge graph paths
- Tokenizer configuration
Users typically do not need to modify this file for inference.
7. Running Inference
7.1 Single Example Inference
Generate a report from two chest X-ray images:
python inference.py example_frontal.png example_lateral.png
Notes:
- Images must be placed inside the
assets/directory - Output is printed to the console
- Intended for quick qualitative inspection and demonstration
8. Loading the Model Programmatically
import json
import torch
from model import CombinedEncoder, ReportDecoderRNN, load_checkpoint
# Load configuration
with open("config.json") as f:
cfg = json.load(f)
# Initialize models
encoder = CombinedEncoder(
embed_dim=cfg["embed_dim"],
gcn_hidden=cfg["gcn_hidden"],
gcn_out=cfg["gcn_out"],
node_feat_dim=cfg["node_feat_dim"]
)
decoder = ReportDecoderRNN(
embed_dim=cfg["embed_dim"],
vocab_size=cfg["vocab_size"],
hidden_dim=cfg["hidden_dim"]
)
# Load trained weights
load_checkpoint("final_model.pth", encoder, decoder)
encoder.eval()
decoder.eval()
9. Training Code
Training and experimentation code is provided in:
notebooks/model_building.ipynb
This notebook includes:
- Dataset construction
- Data loaders and
collate_fn - Training loop
- Evaluation pipeline
- Model checkpointing
10. Intended Use
This model is intended for:
- Academic research
- Educational purposes
- Benchmarking radiology report generation methods
β οΈ This model is NOT intended for clinical deployment or medical decision-making.
- Downloads last month
- 27