rapidfire-ai-inc/Mistral-7B-Instruct-v0.3-bnb-4bit
4-bit quantized (bitsandbytes) instruct model based on
mistralai/Mistral-7B-Instruct-v0.3, fine-tuned with QLoRA on a 10% sample ofHuggingFaceH4/ultrachat_200kfor supervised fine-tuning (SFT).
TL;DR
- Base model:
mistralai/Mistral-7B-Instruct-v0.3 - Quantization: 4-bit bitsandbytes (NF4 + double quant; bfloat16 compute)
- PEFT: QLoRA; LoRA applied to attention & MLP:
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj - Training: SFT on UltraChat 200k (10% sample) for 5 epochs
- Seq length: 2048
- Optimizer:
adamw_8bit, cosine LR, warmup 10% - Effective batch: per-device 2 × grad-accum 4
- Precision: bf16 compute
Intended use & limitations
Use cases. General assistant/chat and instruction following in English. The model is suitable for helpful, safe, concise responses in everyday tasks.
Limitations. May produce inaccurate or biased content and lacks built-in moderation. Do not use for high-risk domains without additional safety layers or human review.
Quickstart (Transformers + bitsandbytes)
Requires
transformers,accelerate,bitsandbytes, and a recent CUDA build for 4-bit inference.
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
model_id = "rapidfire-ai-inc/Mistral-7B-Instruct-v0.3-bnb-4bit"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
quantization_config=bnb_config,
torch_dtype=torch.bfloat16,
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain diffusion models in simple terms."}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
)
print(tok.decode(out[0], skip_special_tokens=True))
Training details
Data
- Dataset:
HuggingFaceH4/ultrachat_200k - Sampling: 10% subset used for SFT before any DPO alignment.
Method
- Approach: QLoRA (parameter-efficient fine-tuning on a 4-bit base)
- Target modules:
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Hyperparameters
max_length = 2048
per_device_train_batch_size = 2
gradient_accumulation_steps = 4
learning_rate = 2e-5
warmup_ratio = 0.1
weight_decay = 0.001
lr_scheduler_type = "cosine"
optim = "adamw_8bit"
bf16 = True
num_train_epochs = 5
LoRA configuration
LoraConfig(
task_type="CAUSAL_LM",
r=64,
lora_alpha=64,
lora_dropout=0.05,
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"
],
bias="none",
)
BitsAndBytes (4-bit) config
BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
Inference tips
- Keep
torch_dtype=torch.bfloat16with 4-bit to balance speed/quality. - Start with:
max_new_tokens=256,temperature=0.6–0.9,top_p=0.9,repetition_penalty=1.1–1.2. - Use the tokenizer’s chat template (
apply_chat_template) to ensure proper formatting.
Responsible AI & safety
This model can generate incorrect or harmful text. Add safety filters and human oversight for production deployments. Please report issues via the model repo.
License
Apache-2.0. Also comply with the base model’s license and usage terms.
Acknowledgements
- Base model: Mistral-7B-Instruct-v0.3 by Mistral AI.
- Dataset: UltraChat 200k by Hugging Face H4.
Citation
@misc{rapidfireai_mistral7b_bnb4bit_2025,
title = {Mistral-7B-Instruct-v0.3-bnb-4bit (RapidFire AI)},
author = {RapidFire AI, Inc.},
year = {2025},
howpublished = {\url{https://huggingface.co/rapidfire-ai-inc/Mistral-7B-Instruct-v0.3-bnb-4bit}}
}
Changelog
- v1.0 — Initial release: 4-bit quantized checkpoint with QLoRA SFT on UltraChat 200k (10% sample).
- Downloads last month
- 2
Model tree for rapidfire-ai-inc/Mistral-7B-Instruct-v0.3-bnb-4bit
Base model
mistralai/Mistral-7B-v0.3