Qwen2.5-1.5B Browser Action LoRA
This is a LoRA adapter fine-tuned from Qwen/Qwen2.5-1.5B-Instruct for browser-use action prediction.
Training objective
The model was trained on step-level action-only browser-agent supervision. Each example contains:
- a system prompt derived from the original BrowserGym teacher prompt
- the current task goal, URL, short recent history, and observation text
- the next BrowserGym action as the assistant target
The goal is not broad open-web generality. This is a scoped research model for testing whether a small model can improve on synthetic browser-use tasks after SFT.
Dataset
Training dataset:
Dataset summary:
- 6508 train rows
- 240 validation rows
- strict filtered export from a larger 10k+ step collection corpus
Fine-tuning setup
- Base model:
Qwen/Qwen2.5-1.5B-Instruct - Method: PEFT LoRA
- Epochs: 1
- Sequence length: 2048
- Learning rate: 2e-4
- Batch size: 4
- Gradient accumulation: 4
Evaluation
Validation set size: 240
Before fine-tuning:
- Parseable action rate: 100%
- Exact-match action accuracy: 17.08%
After fine-tuning:
- Parseable action rate: 100%
- Exact-match action accuracy: 79.58%
This indicates a large improvement over the untuned base model on the target task distribution.
Usage
Load as a PEFT adapter on top of the base model.
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch
base_model_id = "Qwen/Qwen2.5-1.5B-Instruct"
adapter_id = "saital/qwen25-1.5b-browser-action-lora"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
quantization_config=bnb_config,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_id)
tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
Limitations
- evaluated only on the project's synthetic browser-task distribution
- exact-match evaluation is strict and may undercount formatting-equivalent actions
- this adapter is intended for research iteration, not production deployment
- Downloads last month
- 22