kbigdelysh commited on
Commit
158e1e9
·
verified ·
1 Parent(s): dbda53a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +182 -3
README.md CHANGED
@@ -1,3 +1,182 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - mistral
9
+ - causal-lm
10
+ - text-generation
11
+ - 4-bit
12
+ - bitsandbytes
13
+ - qlora
14
+ - lora
15
+ - ultrachat
16
+ - rapidfire-ai
17
+ base_model: mistralai/Mistral-7B-Instruct-v0.3
18
+ datasets:
19
+ - HuggingFaceH4/ultrachat_200k
20
+ ---
21
+
22
+ # rapidfire-ai-inc/Mistral-7B-Instruct-v0.3-bnb-4bit
23
+
24
+ > 4-bit quantized (bitsandbytes) instruct model based on `mistralai/Mistral-7B-Instruct-v0.3`, fine-tuned with QLoRA on a 10% sample of `HuggingFaceH4/ultrachat_200k` for supervised fine-tuning (SFT).
25
+
26
+ ## TL;DR
27
+
28
+ - **Base model:** `mistralai/Mistral-7B-Instruct-v0.3`
29
+ - **Quantization:** 4-bit **bitsandbytes** (NF4 + double quant; bfloat16 compute)
30
+ - **PEFT:** QLoRA; LoRA applied to attention & MLP: `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
31
+ - **Training:** SFT on **UltraChat 200k** (10% sample) for **5 epochs**
32
+ - **Seq length:** 2048
33
+ - **Optimizer:** `adamw_8bit`, cosine LR, warmup 10%
34
+ - **Effective batch:** per-device 2 × grad-accum 4
35
+ - **Precision:** bf16 compute
36
+
37
+ ---
38
+
39
+ ## Intended use & limitations
40
+
41
+ **Use cases.** General assistant/chat and instruction following in English. The model is suitable for helpful, safe, concise responses in everyday tasks.
42
+
43
+ **Limitations.** May produce inaccurate or biased content and lacks built-in moderation. Do not use for high-risk domains without additional safety layers or human review.
44
+
45
+ ---
46
+
47
+ ## Quickstart (Transformers + bitsandbytes)
48
+
49
+ > Requires `transformers`, `accelerate`, `bitsandbytes`, and a recent CUDA build for 4-bit inference.
50
+
51
+ ```python
52
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
53
+ import torch
54
+
55
+ model_id = "rapidfire-ai-inc/Mistral-7B-Instruct-v0.3-bnb-4bit"
56
+
57
+ bnb_config = BitsAndBytesConfig(
58
+ load_in_4bit=True,
59
+ bnb_4bit_compute_dtype=torch.bfloat16,
60
+ bnb_4bit_use_double_quant=True,
61
+ bnb_4bit_quant_type="nf4",
62
+ )
63
+
64
+ tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
65
+ model = AutoModelForCausalLM.from_pretrained(
66
+ model_id,
67
+ device_map="auto",
68
+ quantization_config=bnb_config,
69
+ torch_dtype=torch.bfloat16,
70
+ )
71
+
72
+ messages = [
73
+ {"role": "system", "content": "You are a helpful assistant."},
74
+ {"role": "user", "content": "Explain diffusion models in simple terms."}
75
+ ]
76
+ prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
77
+
78
+ inputs = tok(prompt, return_tensors="pt").to(model.device)
79
+ out = model.generate(
80
+ **inputs,
81
+ max_new_tokens=256,
82
+ temperature=0.7,
83
+ top_p=0.9,
84
+ )
85
+ print(tok.decode(out[0], skip_special_tokens=True))
86
+ ```
87
+
88
+ ---
89
+
90
+ ## Training details
91
+
92
+ ### Data
93
+ - **Dataset:** `HuggingFaceH4/ultrachat_200k`
94
+ - **Sampling:** 10% subset used for SFT before any DPO alignment.
95
+
96
+ ### Method
97
+ - **Approach:** QLoRA (parameter-efficient fine-tuning on a 4-bit base)
98
+ - **Target modules:** `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
99
+
100
+ ### Hyperparameters
101
+ ```
102
+ max_length = 2048
103
+ per_device_train_batch_size = 2
104
+ gradient_accumulation_steps = 4
105
+ learning_rate = 2e-5
106
+ warmup_ratio = 0.1
107
+ weight_decay = 0.001
108
+ lr_scheduler_type = "cosine"
109
+ optim = "adamw_8bit"
110
+ bf16 = True
111
+ num_train_epochs = 5
112
+ ```
113
+
114
+ ### LoRA configuration
115
+ ```python
116
+ LoraConfig(
117
+ task_type="CAUSAL_LM",
118
+ r=64,
119
+ lora_alpha=64,
120
+ lora_dropout=0.05,
121
+ target_modules=[
122
+ "q_proj", "k_proj", "v_proj", "o_proj",
123
+ "gate_proj", "up_proj", "down_proj"
124
+ ],
125
+ bias="none",
126
+ )
127
+ ```
128
+
129
+ ### BitsAndBytes (4-bit) config
130
+ ```python
131
+ BitsAndBytesConfig(
132
+ load_in_4bit=True,
133
+ bnb_4bit_compute_dtype=torch.bfloat16,
134
+ bnb_4bit_use_double_quant=True,
135
+ bnb_4bit_quant_type="nf4",
136
+ )
137
+ ```
138
+
139
+ ---
140
+
141
+ ## Inference tips
142
+
143
+ - Keep `torch_dtype=torch.bfloat16` with 4-bit to balance speed/quality.
144
+ - Start with: `max_new_tokens=256`, `temperature=0.6–0.9`, `top_p=0.9`, `repetition_penalty=1.1–1.2`.
145
+ - Use the tokenizer’s chat template (`apply_chat_template`) to ensure proper formatting.
146
+
147
+ ---
148
+
149
+ ## Responsible AI & safety
150
+
151
+ This model can generate incorrect or harmful text. Add safety filters and human oversight for production deployments. Please report issues via the model repo.
152
+
153
+ ---
154
+
155
+ ## License
156
+
157
+ Apache-2.0. Also comply with the base model’s license and usage terms.
158
+
159
+ ---
160
+
161
+ ## Acknowledgements
162
+
163
+ - Base model: **Mistral-7B-Instruct-v0.3** by Mistral AI.
164
+ - Dataset: **UltraChat 200k** by Hugging Face H4.
165
+
166
+ ---
167
+
168
+ ## Citation
169
+
170
+ ```bibtex
171
+ @misc{rapidfireai_mistral7b_bnb4bit_2025,
172
+ title = {Mistral-7B-Instruct-v0.3-bnb-4bit (RapidFire AI)},
173
+ author = {RapidFire AI, Inc.},
174
+ year = {2025},
175
+ howpublished = {\url{https://huggingface.co/rapidfire-ai-inc/Mistral-7B-Instruct-v0.3-bnb-4bit}}
176
+ }
177
+ ```
178
+
179
+ ---
180
+
181
+ ## Changelog
182
+ - **v1.0** — Initial release: 4-bit quantized checkpoint with QLoRA SFT on UltraChat 200k (10% sample).