Qwen2.5-1.5B Browser Action LoRA

This is a LoRA adapter fine-tuned from Qwen/Qwen2.5-1.5B-Instruct for browser-use action prediction.

Training objective

The model was trained on step-level action-only browser-agent supervision. Each example contains:

a system prompt derived from the original BrowserGym teacher prompt
the current task goal, URL, short recent history, and observation text
the next BrowserGym action as the assistant target

The goal is not broad open-web generality. This is a scoped research model for testing whether a small model can improve on synthetic browser-use tasks after SFT.

Dataset

Training dataset:

https://huggingface.co/datasets/saital/browser-agent-phase1-sft-action-only

Dataset summary:

6508 train rows
240 validation rows
strict filtered export from a larger 10k+ step collection corpus

Fine-tuning setup

Base model: Qwen/Qwen2.5-1.5B-Instruct
Method: PEFT LoRA
Epochs: 1
Sequence length: 2048
Learning rate: 2e-4
Batch size: 4
Gradient accumulation: 4

Evaluation

Validation set size: 240

Before fine-tuning:

Parseable action rate: 100%
Exact-match action accuracy: 17.08%

After fine-tuning:

Parseable action rate: 100%
Exact-match action accuracy: 79.58%

This indicates a large improvement over the untuned base model on the target task distribution.

Usage

Load as a PEFT adapter on top of the base model.

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model_id = "Qwen/Qwen2.5-1.5B-Instruct"
adapter_id = "saital/qwen25-1.5b-browser-action-lora"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_id)

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)

Limitations

evaluated only on the project's synthetic browser-task distribution
exact-match evaluation is strict and may undercount formatting-equivalent actions
this adapter is intended for research iteration, not production deployment

Downloads last month: 22

Model tree for saital/qwen25-1.5b-browser-action-lora

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

(726)

this model