RB-QWEN-3B-16DS-LORA
ReasonBorn v3 is a specialized LoRA (Low-Rank Adaptation) for Qwen2.5-3B, fine-tuned for structured multi-step reasoning. It is trained to decompose complex problems into a specific XML-based reasoning chain (Plan β Step-by-Step Reasoning β Verified Conclusion).
This version was trained using the "MI300X Bulletproof Edition v3" pipeline, optimized for high-throughput compute on AMD Instinct MI300X (192GB) hardware.
π Model Details
- Developed by: Soham (Phase-Technologies / Xerv-AI)
- Model Type: PeftAdapter (LoRA)
- Base Model: Qwen/Qwen2.5-3B
- Training Architecture: Causal Language Modeling with structured COT (Chain of Thought).
- Format: ChatML with custom XML reasoning tags.
π§ Reasoning Format
The model is strictly formatted to follow the ReasonBorn protocol. It will wrap its thoughts in the following structure:
<|im_start|>system
You are ReasonBorn. Output only: <plan>,<reasoning><step>...</step></reasoning>,<conclusion>\boxed{}.<|im_end|>
<|im_start|>user
{question}<|im_end|>
<|im_start|>assistant
<plan>Decomposeβreasonβverifyβconclude.</plan>
<reasoning>
<step index="1">Observation and initial approach.</step>
<step index="2">Calculation or logical derivation.</step>
...
<step index="n">Final verification.<verify>ok</verify></step>
</reasoning>
<conclusion>\boxed{result}</conclusion><|im_end|>
π Training Data Mix (16DS Mixture)
The model was trained on a curated mix of ~200k samples across 11 high-quality datasets to balance math, science, and general logic:
| Dataset | Samples | Focus |
|---|---|---|
| NuminaMath-CoT | 60,000 | Advanced Math Reasoning |
| OrcaMath | 60,000 | Word Problems |
| UltraMath-Conv | 50,000 | Synthetic Conversation Math |
| SciQ / OpenBookQA | ~16,000 | General Science |
| GSM8K | 7,473 | Grade School Math |
| AI2_ARC (Challenge) | 7,500 | Hard Science Questions |
| Xerv-AI/GRAD | 1,933 | Graduate Level Maths |
| GPQA / HLE / ChemQA | ~7,000 | Expert-level Logic & Chemistry |
βοΈ Training Hyperparameters
Optimized for the AMD MI300X for sub-6-hour completion:
- GPU: 1x AMD MI300X 192GB
- Precision:
bf16(BFloat16) - Optimizer:
adamw_torch_fused - Learning Rate: 2.5e-4 (Cosine schedule)
- Epochs: 1.15
- Batch Size: 48 (Global Batch: 96 with Grad Accum 2)
- Max Context: 512 tokens
- LoRA Config: - Rank (R): 16
- Alpha: 32
- Target Modules: All Linear Layers (
q, k, v, o, gate, up, down_proj) - Dropout: 0.05
π οΈ Usage (Inference)
Since this is a LoRA adapter, you must load the base model first:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# 1. Setup IDs
base_model_id = "Qwen/Qwen2.5-3B"
adapter_id = "Xerv-AI/rb-qwen3b-16ds-lora"
# 2. Load Tokenizer and Base Model
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16, # Or float16 if no BF16 support
device_map="auto"
)
# 3. Load your LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()
# 4. Prepare the prompt (matching your specific ReasonBorn template)
question = "Solve x^3 - 6x^2 + 11x - 6 = 0 for real roots."
prompt = (
"<|im_start|>system\n"
"You are ReasonBorn. Output only: <plan>,<reasoning><step>...</step></reasoning>,<conclusion>\\boxed{}.\n"
"<|im_end|>\n"
f"<|im_start|>user\n{question}<|im_end|>\n"
"<|im_start|>assistant\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# 5. Generate
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.2,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
# 6. Decode
result = tokenizer.decode(output[0], skip_special_tokens=True)
print(result.split("assistant\n")[-1])
Training Script
import os
import gc
import re
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
import torch
from huggingface_hub import login, HfApi
from datasets import load_dataset, Dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
TrainingArguments,
Trainer,
DataCollatorForLanguageModeling,
)
from peft import LoraConfig, get_peft_model
os.environ["TOKENIZERS_PARALLELISM"] = "false"
MODEL_ID = "Qwen/Qwen2.5-3B"
REPO_NAME = "rb-qwen3b-16ds-lora"
SAVE_DIR = "./rb-qwen-16ds-lora-final"
MAX_CTX = 512
EPOCHS = 1.15
LR = 2.5e-4
LORA_R = 16
LORA_ALPHA = 32
BATCH_SIZE = 48
GRAD_ACCUM = 2
WORKERS = 12
DATA_MIX = {
"NuminaMath": {"path": "AI-MO/NuminaMath-CoT", "max_samples": 60000, "split": "train"},
"OrcaMath": {"path": "microsoft/orca-math-word-problems-200k", "max_samples": 60000, "split": "train"},
"UltraMath-Conv": {"path": "openbmb/UltraData-Math", "config": "UltraData-Math-L3-Conversation-Synthetic","max_samples": 50000, "split": "train"},
"GSM8K": {"path": "openai/gsm8k", "config": "main", "max_samples": 7473, "split": "train"},
"AI2_ARC": {"path": "allenai/ai2_arc", "config": "ARC-Challenge", "max_samples": 7500, "split": "train"},
"SciQ": {"path": "sciq", "max_samples": 11679, "split": "train"},
"OpenBookQA": {"path": "openbookqa", "max_samples": 4957, "split": "train"},
"GPQA": {"path": "Idavidrein/gpqa", "config": "gpqa_diamond", "max_samples": 198, "split": "train"},
"ChemistryQA": {"path": "avaliev/ChemistryQA", "max_samples": 4000, "split": "train"},
"HLE": {"path": "cais/hle", "max_samples": 2700, "split": "test"},
"GRAD": {"path": "Xerv-AI/GRAD", "max_samples": 1933, "split": "train"},
}
def format_example(ex):
try:
q = str(ex.get("question") or ex.get("problem") or ex.get("prompt") or "").strip()
s = str(ex.get("answer") or ex.get("solution") or ex.get("response") or "").strip()
if len(q) < 5 or len(s) < 5:
return None
boxed = re.search(r'\\boxed\{(.*?)\}', s, re.DOTALL)
ans = boxed.group(1).strip() if boxed else s[:80]
reasoning = re.sub(r'\\boxed\{.*?\}', '', s, flags=re.DOTALL).strip()
steps = [l.strip() for l in reasoning.split('\n') if len(l.strip()) > 8][:5]
xml = "<plan>Decomposeβreasonβverifyβconclude.</plan>\n<reasoning>\n"
for i, step in enumerate(steps, 1):
v = "<verify>ok</verify>" if i == len(steps) else ""
xml += f'<step index="{i}">{step}{v}</step>\n'
xml += f"</reasoning>\n<conclusion>\\boxed{{{ans}}}</conclusion>"
sys_p = "You are ReasonBorn. Output only: <plan>,<reasoning><step>...</step></reasoning>,<conclusion>\\boxed{}."
return {"text": (
f"<|im_start|>system\n{sys_p}<|im_end|>\n"
f"<|im_start|>user\n{q}<|im_end|>\n"
f"<|im_start|>assistant\n{xml}<|im_end|>"
)}
except Exception:
return None
def load_one(name, cfg):
examples = []
kwargs = {"split": cfg["split"], "trust_remote_code": True}
if "config" in cfg:
kwargs["name"] = cfg["config"]
try:
ds = load_dataset(cfg["path"], **kwargs)
if len(ds) > cfg["max_samples"]:
ds = ds.select(range(cfg["max_samples"]))
for ex in ds:
r = format_example(ex)
if r:
examples.append(r)
return name, examples, "ok"
except Exception:
pass
try:
ds = load_dataset(cfg["path"], streaming=True, **kwargs)
for ex in ds:
if len(examples) >= cfg["max_samples"]:
break
r = format_example(ex)
if r:
examples.append(r)
return name, examples, "stream"
except Exception:
return name, [], "failed"
login()
all_ex = []
with ThreadPoolExecutor(max_workers=6) as pool:
futs = {pool.submit(load_one, n, c): n for n, c in DATA_MIX.items()}
for fut in as_completed(futs):
n, exs, status = fut.result()
all_ex.extend(exs)
train_ds = Dataset.from_list(all_ex).shuffle(seed=42)
del all_ex
gc.collect()
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
tokenized = train_ds.map(
lambda b: tokenizer(b["text"], truncation=True, max_length=MAX_CTX, padding=False),
batched=True, batch_size=4000, num_proc=16,
remove_columns=["text"],
)
tokenized = tokenized.filter(lambda x: len(x["input_ids"]) >= 8, num_proc=16)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True,
attn_implementation="eager",
)
model = model.to("cuda")
torch.cuda.synchronize()
model.gradient_checkpointing_enable(gradient_checkpointing_kwargs={"use_reentrant": False})
model.enable_input_require_grads()
model = get_peft_model(model, LoraConfig(
r=LORA_R,
lora_alpha=LORA_ALPHA,
target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
))
collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
args = TrainingArguments(
output_dir = "./chk",
num_train_epochs = EPOCHS,
per_device_train_batch_size = BATCH_SIZE,
gradient_accumulation_steps = GRAD_ACCUM,
gradient_checkpointing = True,
optim = "adamw_torch_fused",
learning_rate = LR,
bf16 = True,
fp16 = False,
logging_steps = 25,
save_strategy = "steps",
save_steps = 500,
save_total_limit = 2,
warmup_ratio = 0.05,
lr_scheduler_type = "cosine",
weight_decay = 0.01,
max_grad_norm = 0.5,
dataloader_num_workers = WORKERS,
dataloader_pin_memory = True,
dataloader_prefetch_factor = 4,
report_to = "none",
remove_unused_columns = True,
)
trainer = Trainer(
model=model,
args=args,
train_dataset=tokenized,
data_collator=collator,
)
trainer.train()
os.makedirs(SAVE_DIR, exist_ok=True)
trainer.save_model(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)
β οΈ Limitations
This model is a 3B parameter model and may hallucinate on extremely complex mathematical proofs. It is strictly optimized for the XML reasoning format; using standard chat templates may result in lower performance.
π‘ Technical Note: Early Stopping & Convergence
This model was originally scheduled for a full 2-epoch run. However, due to a kernel disconnection on the MI300X training environment, the training was interrupted at Step 1000 (approximately 1.6 epochs).
Upon internal benchmarking and inference testing, this "early-stopped" checkpoint (Step 1000) demonstrated:
- Generalization: The model successfully avoided the "parrotting" phase often seen in later epochs of LoRA fine-tuning.
- Format Adherence: Despite the interruption, the model reached full convergence on the ReasonBorn XML tag structure.
- Mathematical Integrity: Fidelity reasoning on multi-step calculations.
We have elected to release this checkpoint as the final release.
- Downloads last month
- 131
Model tree for Xerv-AI/rb-qwen3b-16ds-lora
Base model
Qwen/Qwen2.5-3B