rapidfire-ai-inc
/

Mistral-7B-Instruct-v0.3-bnb-4bit

+---
+license: apache-2.0
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- mistral
+- causal-lm
+- text-generation
+- 4-bit
+- bitsandbytes
+- qlora
+- lora
+- ultrachat
+- rapidfire-ai
+base_model: mistralai/Mistral-7B-Instruct-v0.3
+datasets:
+- HuggingFaceH4/ultrachat_200k
+---
+# rapidfire-ai-inc/Mistral-7B-Instruct-v0.3-bnb-4bit
+> 4-bit quantized (bitsandbytes) instruct model based on `mistralai/Mistral-7B-Instruct-v0.3`, fine-tuned with QLoRA on a 10% sample of `HuggingFaceH4/ultrachat_200k` for supervised fine-tuning (SFT).
+## TL;DR
+- **Base model:** `mistralai/Mistral-7B-Instruct-v0.3`
+- **Quantization:** 4-bit **bitsandbytes** (NF4 + double quant; bfloat16 compute)
+- **PEFT:** QLoRA; LoRA applied to attention & MLP: `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
+- **Training:** SFT on **UltraChat 200k** (10% sample) for **5 epochs**
+- **Seq length:** 2048
+- **Optimizer:** `adamw_8bit`, cosine LR, warmup 10%
+- **Effective batch:** per-device 2 × grad-accum 4
+- **Precision:** bf16 compute
+---
+## Intended use & limitations
+**Use cases.** General assistant/chat and instruction following in English. The model is suitable for helpful, safe, concise responses in everyday tasks.
+**Limitations.** May produce inaccurate or biased content and lacks built-in moderation. Do not use for high-risk domains without additional safety layers or human review.
+---
+## Quickstart (Transformers + bitsandbytes)
+> Requires `transformers`, `accelerate`, `bitsandbytes`, and a recent CUDA build for 4-bit inference.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+import torch
+model_id = "rapidfire-ai-inc/Mistral-7B-Instruct-v0.3-bnb-4bit"
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.bfloat16,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+)
+tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="auto",
+    quantization_config=bnb_config,
+    torch_dtype=torch.bfloat16,
+)
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user",   "content": "Explain diffusion models in simple terms."}
+]
+prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tok(prompt, return_tensors="pt").to(model.device)
+out = model.generate(
+    **inputs,
+    max_new_tokens=256,
+    temperature=0.7,
+    top_p=0.9,
+)
+print(tok.decode(out[0], skip_special_tokens=True))
+```
+---
+## Training details
+### Data
+- **Dataset:** `HuggingFaceH4/ultrachat_200k`
+- **Sampling:** 10% subset used for SFT before any DPO alignment.
+### Method
+- **Approach:** QLoRA (parameter-efficient fine-tuning on a 4-bit base)
+- **Target modules:** `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
+### Hyperparameters
+```
+max_length = 2048
+per_device_train_batch_size = 2
+gradient_accumulation_steps = 4
+learning_rate = 2e-5
+warmup_ratio = 0.1
+weight_decay = 0.001
+lr_scheduler_type = "cosine"
+optim = "adamw_8bit"
+bf16 = True
+num_train_epochs = 5
+```
+### LoRA configuration
+```python
+LoraConfig(
+    task_type="CAUSAL_LM",
+    r=64,
+    lora_alpha=64,
+    lora_dropout=0.05,
+    target_modules=[
+        "q_proj", "k_proj", "v_proj", "o_proj",
+        "gate_proj", "up_proj", "down_proj"
+    ],
+    bias="none",
+)
+```
+### BitsAndBytes (4-bit) config
+```python
+BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.bfloat16,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+)
+```
+---
+## Inference tips
+- Keep `torch_dtype=torch.bfloat16` with 4-bit to balance speed/quality.
+- Start with: `max_new_tokens=256`, `temperature=0.6–0.9`, `top_p=0.9`, `repetition_penalty=1.1–1.2`.
+- Use the tokenizer’s chat template (`apply_chat_template`) to ensure proper formatting.
+---
+## Responsible AI & safety
+This model can generate incorrect or harmful text. Add safety filters and human oversight for production deployments. Please report issues via the model repo.
+---
+## License
+Apache-2.0. Also comply with the base model’s license and usage terms.
+---
+## Acknowledgements
+- Base model: **Mistral-7B-Instruct-v0.3** by Mistral AI.
+- Dataset: **UltraChat 200k** by Hugging Face H4.
+---
+## Citation
+```bibtex
+@misc{rapidfireai_mistral7b_bnb4bit_2025,
+  title        = {Mistral-7B-Instruct-v0.3-bnb-4bit (RapidFire AI)},
+  author       = {RapidFire AI, Inc.},
+  year         = {2025},
+  howpublished = {\url{https://huggingface.co/rapidfire-ai-inc/Mistral-7B-Instruct-v0.3-bnb-4bit}}
+}
+```
+---
+## Changelog
+- **v1.0** — Initial release: 4-bit quantized checkpoint with QLoRA SFT on UltraChat 200k (10% sample).