🩻 MIMIC-CXR Radiology Report Generator

This repository provides a PyTorch encoder–decoder model for automated radiology report generation using chest X-ray images and structured medical knowledge.

The model is designed to generate clinically coherent free-text radiology reports by jointly reasoning over:

Dual-view chest X-ray images (frontal + lateral)
Knowledge graph priors encoding structured medical relationships

This work is intended for academic research and educational purposes.

📌 The codebase for building the model is hosted in the following gitub repository:
https://github.com/shemzegem200/MIMIC-CXR-Medical-Report-Generation-Using-Deep-Learning

📌 Additional resources such as pickle file, adjacency matrix, model, grouped dataframe are in the following drive link: https://drive.google.com/drive/folders/1DThVS3wzvL9EbdtBQjFHEWIO8ilqLrvz?usp=sharing

1. Overview

Radiology report generation is a challenging problem that requires the integration of:

Accurate visual understanding of medical images
Robust clinical language modeling
Anatomical and pathological consistency

This project proposes a hybrid architecture that integrates:

Visual features extracted from dual-view chest X-rays
Structured medical knowledge encoded using a Graph Convolutional Network (GCN)
Sequential text generation using an LSTM-based decoder

The model is trained and evaluated on the MIMIC-CXR dataset.

2. Model Architecture

2.1 Visual Encoder

Backbone: ResNet-18
Input: Two chest X-ray images (frontal + lateral)
Output: Fixed-length image embeddings

Each image is processed independently, and the resulting embeddings are combined downstream.

2.2 Knowledge Graph Encoder

Graph representation of medical entities
Encoder: Graph Convolutional Network (GCN)
Node features derived from textual embeddings
Output: Global knowledge embedding via mean pooling

This component provides structured clinical priors that guide report generation.

2.3 Multimodal Fusion

The image embeddings and the knowledge graph embedding are concatenated and projected into a unified latent space:

[Image_1 | Image_2 | Knowledge_Graph] -> Linear_Projection -> Shared_Embedding

2.4 Report Decoder

Decoder: LSTM-based RNN
Input: Multimodal embedding + tokenized report prefix
Output: Token-level probability distribution over the vocabulary

Text generation uses nucleus (top-p) sampling with minimum-length constraints.

Architecture Diagram

The following diagram illustrates the overall architecture of the proposed radiology report generation framework, highlighting the integration of visual features, knowledge graph priors, and sequential text generation.

Model Architecture

3. Repository Structure

.
├── model.py                                  # Encoder, GCN, and decoder definitions
├── inference.py                              # Command-line inference & evaluation script
├── utils.py                                 # Dataset, collate_fn, KG utilities
├── config.json                              # Model hyperparameters and paths
├── final_model.pth                          # Trained model weights
├── requirements.txt                         # Python dependencies
│
├── assets/
│   └── adjacency_matrix.csv                 # Knowledge graph structure
│
├── notebooks/
│   ├── model_building.ipynb                 # Training and experimentation notebook
│   └── knowledge_graph_construction.ipynb   # Knowledge graph construction notebook
│
└── README.md

4. Dataset Notice ⚠️

This repository does NOT include the MIMIC-CXR dataset.

MIMIC-CXR is subject to PhysioNet credentialed access
Redistribution of the dataset is prohibited

To obtain the dataset, visit:
https://physionet.org/content/mimic-cxr/

All training and evaluation code assumes that the user has legally obtained access to the dataset.

5. Installation

5.1 Using Conda (Recommended)

conda create -n mrg python=3.10
conda activate mrg

5.2 Without Conda (Using venv)

python -m venv mrg_env
source mrg_env/bin/activate        # Linux / macOS
mrg_env\Scripts\activate           # Windows

5.3 Install Dependencies

Using requirements.txt:

pip install -r requirements.txt

6. Configuration

Model and inference parameters are defined in config.json, including:

Embedding dimensions
Vocabulary size
Knowledge graph paths
Tokenizer configuration

Users typically do not need to modify this file for inference.

7. Running Inference

7.1 Single Example Inference

Generate a report from two chest X-ray images:

python inference.py example_frontal.png example_lateral.png

Notes:

Images must be placed inside the assets/ directory
Output is printed to the console
Intended for quick qualitative inspection and demonstration

8. Loading the Model Programmatically

import json
import torch
from model import CombinedEncoder, ReportDecoderRNN, load_checkpoint

# Load configuration
with open("config.json") as f:
    cfg = json.load(f)

# Initialize models
encoder = CombinedEncoder(
    embed_dim=cfg["embed_dim"],
    gcn_hidden=cfg["gcn_hidden"],
    gcn_out=cfg["gcn_out"],
    node_feat_dim=cfg["node_feat_dim"]
)

decoder = ReportDecoderRNN(
    embed_dim=cfg["embed_dim"],
    vocab_size=cfg["vocab_size"],
    hidden_dim=cfg["hidden_dim"]
)

# Load trained weights
load_checkpoint("final_model.pth", encoder, decoder)

encoder.eval()
decoder.eval()

9. Training Code

Training and experimentation code is provided in:

notebooks/model_building.ipynb

This notebook includes:

Dataset construction
Data loaders and collate_fn
Training loop
Evaluation pipeline
Model checkpointing

10. Intended Use

This model is intended for:

Academic research
Educational purposes
Benchmarking radiology report generation methods

⚠️ This model is NOT intended for clinical deployment or medical decision-making.

Downloads last month: 27