Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Paper
β’
2504.21561
β’
Published
β’
1
This repository contains the LoRA checkpoint for SPORT, a framework that enables multimodal agents to improve iteratively through self-generated tasks and preference-based optimization. We finetuned Qwen2-VL-7B-Instruct using LoRA adapters and Direct Preference Optimization (DPO), making the model more effective at reasoning about multimodal tasks and aligning with preference signals.
On the GTA benchmark, SPORT demonstrates consistent improvements over strong baselines:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = "Qwen/Qwen2-VL-7B"
lora_ckpt = "your-hf-username/SPORT-LoRA-7B"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, lora_ckpt)
If you use SPORT or this checkpoint in your research, please cite:
@inproceedings{li2025iterative,
title={Iterative Trajectory Exploration for Multimodal Agents},
author={Li, Pengxiang and Gao, Zhi and Zhang, Bofei and Mi, Yapeng and Ma, Xiaojian and Shi, Chenrui and Yuan, Tao and Wu, Yuwei and Jia, Yunde and Zhu, Song-Chun and Li, Qing},
year={2025},
eprint={2504.21561},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2504.21561},
}
β οΈ Note: This repository only provides LoRA weights. You must load them on top of the base Qwen2-VL-7B model for inference.