🎯 SPORT: Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

This repository contains the LoRA checkpoint for SPORT, a framework that enables multimodal agents to improve iteratively through self-generated tasks and preference-based optimization. We finetuned Qwen2-VL-7B-Instruct using LoRA adapters and Direct Preference Optimization (DPO), making the model more effective at reasoning about multimodal tasks and aligning with preference signals.

📋 Key Features

LoRA Fine-tuning: Lightweight finetuning on top of Qwen2-VL-7B-Instruct for efficient adaptation.
DPO Training: Preference-based optimization for stronger alignment without human annotations.
Task Synthesis: Multimodal task generation via LLMs for broad coverage.
Step Exploration: Multiple candidate actions sampled per decision point.
Step Verification: LLM-based critics evaluate and rank candidate outcomes.
Self-Improvement Loop: Iterative cycle of task creation, exploration, and refinement.

🚀 Performance Highlights

On the GTA benchmark, SPORT demonstrates consistent improvements over strong baselines:

+7% Answer Accuracy (AnsAcc)
+8% Tool Accuracy (ToolAcc)
+7% Code Execution Success (CodeExec)

💾 Model Details

Base Model: Qwen2-VL-7B
Finetuning Method: LoRA (rank 64, α=16)
Optimization: Direct Preference Optimization (DPO)
Checkpoint: LoRA weights only (requires merging with base model for inference)

🛠️ Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "Qwen/Qwen2-VL-7B"
lora_ckpt = "your-hf-username/SPORT-LoRA-7B"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, lora_ckpt)

📝 Citation

If you use SPORT or this checkpoint in your research, please cite:

@inproceedings{li2025iterative,
  title={Iterative Trajectory Exploration for Multimodal Agents}, 
  author={Li, Pengxiang and Gao, Zhi and Zhang, Bofei and Mi, Yapeng and Ma, Xiaojian and Shi, Chenrui and Yuan, Tao and Wu, Yuwei and Jia, Yunde and Zhu, Song-Chun and Li, Qing},
  year={2025},
  eprint={2504.21561},
  archivePrefix={arXiv},
  url={https://arxiv.org/abs/2504.21561}, 
}

⚠️ Note: This repository only provides LoRA weights. You must load them on top of the base Qwen2-VL-7B model for inference.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PengxiangLi/SPORT-Qwen2-VL-7B-Lora

Base model

Qwen/Qwen2-VL-7B

Finetuned

Qwen/Qwen2-VL-7B-Instruct

Finetuned

(565)

this model

Dataset used to train PengxiangLi/SPORT-Qwen2-VL-7B-Lora

Paper for PengxiangLi/SPORT-Qwen2-VL-7B-Lora

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

Paper • 2504.21561 • Published Apr 30, 2025 • 1