🩻 MIMIC-CXR Radiology Report Generator

This repository provides a PyTorch encoder–decoder model for automated radiology report generation using chest X-ray images and structured medical knowledge.

The model is designed to generate clinically coherent free-text radiology reports by jointly reasoning over:

  • Dual-view chest X-ray images (frontal + lateral)
  • Knowledge graph priors encoding structured medical relationships

This work is intended for academic research and educational purposes.

πŸ“Œ The codebase for building the model is hosted in the following gitub repository:
https://github.com/shemzegem200/MIMIC-CXR-Medical-Report-Generation-Using-Deep-Learning

πŸ“Œ Additional resources such as pickle file, adjacency matrix, model, grouped dataframe are in the following drive link: https://drive.google.com/drive/folders/1DThVS3wzvL9EbdtBQjFHEWIO8ilqLrvz?usp=sharing


1. Overview

Radiology report generation is a challenging problem that requires the integration of:

  • Accurate visual understanding of medical images
  • Robust clinical language modeling
  • Anatomical and pathological consistency

This project proposes a hybrid architecture that integrates:

  1. Visual features extracted from dual-view chest X-rays
  2. Structured medical knowledge encoded using a Graph Convolutional Network (GCN)
  3. Sequential text generation using an LSTM-based decoder

The model is trained and evaluated on the MIMIC-CXR dataset.


2. Model Architecture

2.1 Visual Encoder

  • Backbone: ResNet-18
  • Input: Two chest X-ray images (frontal + lateral)
  • Output: Fixed-length image embeddings

Each image is processed independently, and the resulting embeddings are combined downstream.


2.2 Knowledge Graph Encoder

  • Graph representation of medical entities
  • Encoder: Graph Convolutional Network (GCN)
  • Node features derived from textual embeddings
  • Output: Global knowledge embedding via mean pooling

This component provides structured clinical priors that guide report generation.


2.3 Multimodal Fusion

The image embeddings and the knowledge graph embedding are concatenated and projected into a unified latent space:

[Image_1 | Image_2 | Knowledge_Graph] -> Linear_Projection -> Shared_Embedding

2.4 Report Decoder

  • Decoder: LSTM-based RNN
  • Input: Multimodal embedding + tokenized report prefix
  • Output: Token-level probability distribution over the vocabulary

Text generation uses nucleus (top-p) sampling with minimum-length constraints.

Architecture Diagram

The following diagram illustrates the overall architecture of the proposed radiology report generation framework, highlighting the integration of visual features, knowledge graph priors, and sequential text generation.

Model Architecture


3. Repository Structure

.
β”œβ”€β”€ model.py                                  # Encoder, GCN, and decoder definitions
β”œβ”€β”€ inference.py                              # Command-line inference & evaluation script
β”œβ”€β”€ utils.py                                 # Dataset, collate_fn, KG utilities
β”œβ”€β”€ config.json                              # Model hyperparameters and paths
β”œβ”€β”€ final_model.pth                          # Trained model weights
β”œβ”€β”€ requirements.txt                         # Python dependencies
β”‚
β”œβ”€β”€ assets/
β”‚   └── adjacency_matrix.csv                 # Knowledge graph structure
β”‚
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ model_building.ipynb                 # Training and experimentation notebook
β”‚   └── knowledge_graph_construction.ipynb   # Knowledge graph construction notebook
β”‚
└── README.md

4. Dataset Notice ⚠️

This repository does NOT include the MIMIC-CXR dataset.

  • MIMIC-CXR is subject to PhysioNet credentialed access
  • Redistribution of the dataset is prohibited

To obtain the dataset, visit:
https://physionet.org/content/mimic-cxr/

All training and evaluation code assumes that the user has legally obtained access to the dataset.


5. Installation

5.1 Using Conda (Recommended)

conda create -n mrg python=3.10
conda activate mrg

5.2 Without Conda (Using venv)

python -m venv mrg_env
source mrg_env/bin/activate        # Linux / macOS
mrg_env\Scripts\activate           # Windows

5.3 Install Dependencies

Using requirements.txt:

pip install -r requirements.txt

6. Configuration

Model and inference parameters are defined in config.json, including:

  • Embedding dimensions
  • Vocabulary size
  • Knowledge graph paths
  • Tokenizer configuration

Users typically do not need to modify this file for inference.


7. Running Inference

7.1 Single Example Inference

Generate a report from two chest X-ray images:

python inference.py example_frontal.png example_lateral.png

Notes:

  • Images must be placed inside the assets/ directory
  • Output is printed to the console
  • Intended for quick qualitative inspection and demonstration

8. Loading the Model Programmatically

import json
import torch
from model import CombinedEncoder, ReportDecoderRNN, load_checkpoint

# Load configuration
with open("config.json") as f:
    cfg = json.load(f)

# Initialize models
encoder = CombinedEncoder(
    embed_dim=cfg["embed_dim"],
    gcn_hidden=cfg["gcn_hidden"],
    gcn_out=cfg["gcn_out"],
    node_feat_dim=cfg["node_feat_dim"]
)

decoder = ReportDecoderRNN(
    embed_dim=cfg["embed_dim"],
    vocab_size=cfg["vocab_size"],
    hidden_dim=cfg["hidden_dim"]
)

# Load trained weights
load_checkpoint("final_model.pth", encoder, decoder)

encoder.eval()
decoder.eval()

9. Training Code

Training and experimentation code is provided in:

notebooks/model_building.ipynb

This notebook includes:

  • Dataset construction
  • Data loaders and collate_fn
  • Training loop
  • Evaluation pipeline
  • Model checkpointing

10. Intended Use

This model is intended for:

  • Academic research
  • Educational purposes
  • Benchmarking radiology report generation methods

⚠️ This model is NOT intended for clinical deployment or medical decision-making.

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support