--- language: en license: mit library_name: transformers tags: - image-to-text --- Requirements: ```bash pip install opencv-python pip install albumentations pip install accelerate torch==2.2.1 transformers==4.39.0 # may work with more recent version ``` Adapted sample script for SRRG ```python import io import requests import torch from PIL import Image from transformers import AutoModelForCausalLM, AutoTokenizer import tempfile # step 1: Setup constants model_name = "StanfordAIMI/CheXagent-2-3b-srrg-impression" dtype = torch.bfloat16 device = "cuda" # step 2: Load Processor and Model tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=True) model = model.to(dtype) model.eval() # step 3: Download image from URL, save to a local file, and prepare path list url = "https://huggingface.co/IAMJB/interpret-cxr-impression-baseline/resolve/main/effusions-bibasal.jpg" resp = requests.get(url) resp.raise_for_status() # Use a NamedTemporaryFile so it lives on disk with tempfile.NamedTemporaryFile(delete=False, suffix=".jpg") as tmpfile: tmpfile.write(resp.content) local_path = tmpfile.name # this is a real file path on disk paths = [local_path] prompt = "Structured Radiology Report Generation for Impression Section" # build the multimodal input query = tokenizer.from_list_format( [*([{"image": img} for img in paths]), {"text": prompt}] ) # format as a chat conversation conv = [ {"from": "system", "value": "You are a helpful assistant."}, {"from": "human", "value": query}, ] # tokenize and generate input_ids = tokenizer.apply_chat_template( conv, add_generation_prompt=True, return_tensors="pt" ) output = model.generate( input_ids.to(device), do_sample=False, num_beams=1, temperature=1.0, top_p=1.0, use_cache=True, max_new_tokens=512, )[0] # decode the “impression” text response = tokenizer.decode(output[input_ids.size(1) : -1]) print(response) ``` Response: ``` 1. Interval increase in bilateral pleural effusions. 2. Interval increase in bibasilar opacities, which may represent atelectasis or consolidation. 3. Stable position of the right upper extremity peripherally inserted central catheter (PICC line). ```