Qwen3‑VL‑8B ChartQA (LoRA)
Overview
This repository contains a Qwen3‑VL‑8B‑Instruct vision‑language model fine‑tuned to answer questions about charts and plots, focusing on concise numerical or short textual answers.
Fine‑tuning was performed via LoRA using the human‑annotated subset of the HuggingFaceM4/ChartQA dataset (train split, human_or_machine = human).
Typical behavior:
Input: an image of a bar chart and the question
What is the value of the blue bar in 2018?
Output:24Input: an image of a line chart and the question
In which year does the orange line reach its maximum?
Output:2015Input: an image of a pie chart and the question
What percentage corresponds to Sales?
Output:38%
The LoRA adapter was trained with LLaMA‑Factory on top of Qwen/Qwen3-VL-8B-Instruct and can be loaded either as a standard Transformers adapter or merged into the base weights.
Base model
- Base:
Qwen/Qwen3-VL-8B-Instruct - Architecture: multimodal vision‑language model, ~8.8B parameters
- Intended use: instruction‑following and visual question answering (images + text)
Training details
- Framework: LLaMA‑Factory (Supervised Fine‑Tuning with LoRA)
- Finetuning type: LoRA on transformer linear layers, vision tower and projector frozen
- Dataset:
HuggingFaceM4/ChartQA(train split, only human‑authored QA pairs) - Task: single‑turn chart question answering (chart image + question → short answer)
- Input format: Qwen3‑VL chat template with
<|im_start|>user/<|im_start|>assistantand<|vision_start|>…<|vision_end|>tokens; answers taken as the first label (label[0]) for each sample - Number of train examples: 7 398 human‑annotated samples
- Max sequence length: 2048 tokens
- Epochs: 3
- Batch / grad accumulation: effective batch size 64 (multi‑GPU + gradient accumulation)
- Learning rate: 5e‑5 (AdamW with scheduler)
- Precision: mixed precision (FP16 / bfloat16) with gradient checkpointing
- Trainable parameters: ~21.8M LoRA params (≈0.25 % of 8.79B total)
Final train loss was around 0.32 after 3 epochs (~10.6M seen tokens), indicating a strong fit on ChartQA while updating only a small LoRA head.
For best results:
- Provide a single chart image and a clear question in one turn.
- Use
temperature=0.0–0.2andmax_new_tokensaround 16–64. - Expect short answers (numbers, years, category names) rather than long explanations.
Limitations
- The model is specialized for chart question answering and is not a general‑purpose assistant.
- It may struggle with non‑chart images, highly stylized plots, or layouts very different from those in ChartQA.
- Numerical and logical reasoning quality is bounded by the underlying Qwen3‑VL‑8B model; answers used in analytical or reporting workflows should be manually verified.
- Downloads last month
- 2