rodrigomt commited on
Commit
091430d
Β·
verified Β·
1 Parent(s): 2d0c382

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -17
README.md CHANGED
@@ -12,19 +12,47 @@ tags:
12
  - quelmap/Lightning-4b
13
  - Intel/hebrew-math-tutor-v1
14
  - GetSoloTech/Qwen3-Code-Reasoning-4B
 
 
 
 
15
  ---
16
 
17
- # quem-v2-4b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- quem-v2-4b is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
20
  * [janhq/Jan-v1-2509](https://huggingface.co/janhq/Jan-v1-2509)
21
  * [quelmap/Lightning-4b](https://huggingface.co/quelmap/Lightning-4b)
22
  * [Intel/hebrew-math-tutor-v1](https://huggingface.co/Intel/hebrew-math-tutor-v1)
23
  * [GetSoloTech/Qwen3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B)
24
 
25
- ## 🧩 Configuration
 
 
 
 
 
 
26
 
27
- yaml
 
 
28
  models:
29
  - model: janhq/Jan-v1-2509
30
  parameters:
@@ -55,26 +83,101 @@ parameters:
55
 
56
  device: auto
57
  dtype: bfloat16
58
- ## πŸ’» Usage
 
 
 
 
 
 
 
 
 
 
59
 
60
- python
61
- !pip install -qU transformers accelerate
62
 
63
- from transformers import AutoTokenizer
64
- import transformers
65
  import torch
66
 
67
- model = "rodrigomt/quem-v2-4b"
68
- messages = [{"role": "user", "content": "What is a large language model?"}]
69
 
70
- tokenizer = AutoTokenizer.from_pretrained(model)
71
- prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
72
- pipeline = transformers.pipeline(
 
 
 
 
 
 
 
73
  "text-generation",
74
- model=model,
75
  torch_dtype=torch.float16,
76
  device_map="auto",
77
  )
78
 
79
- outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
80
- print(outputs[0]["generated_text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - quelmap/Lightning-4b
13
  - Intel/hebrew-math-tutor-v1
14
  - GetSoloTech/Qwen3-Code-Reasoning-4B
15
+ language:
16
+ - en
17
+ - pt
18
+ pipeline_tag: text-generation
19
  ---
20
 
21
+ # πŸ€– quem-4b v2
22
+
23
+ A 4-billion parameter merged language model built on the **Qwen3** family. **quem-v2-4b** blends four complementary models using **LazyMergekit** with the **DARE-TIES** method to deliver a compact, versatile model for instruction following, coding assistance, and reasoning.
24
+
25
+ ## πŸ“‹ Overview
26
+
27
+ **quem-v2-4b** is a carefully balanced merge of four specialized 4B-class models. Using **DARE-TIES** with equal weights, it aims to retain strengths across general conversation (Jan), fast responses (Lightning), mathematical reasoning (Hebrew Math Tutor), and code reasoning (Qwen3 Code Reasoning).
28
+
29
+ ### ✨ Key Features
30
+
31
+ * **Balanced Merge:** Equal weights (25% each) for stability across skills.
32
+ * **Reasoning & Code:** Improved chain-of-thought style reasoning and code understanding from contributor models.
33
+ * **Compact & Efficient:** 4B parameters for fast inference on a single consumer GPU.
34
+ * **Instruction-Tuned:** Works out-of-the-box with standard chat prompts via the HF chat template.
35
+
36
+ ---
37
+
38
+ ## πŸ”§ Base Models
39
 
 
40
  * [janhq/Jan-v1-2509](https://huggingface.co/janhq/Jan-v1-2509)
41
  * [quelmap/Lightning-4b](https://huggingface.co/quelmap/Lightning-4b)
42
  * [Intel/hebrew-math-tutor-v1](https://huggingface.co/Intel/hebrew-math-tutor-v1)
43
  * [GetSoloTech/Qwen3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B)
44
 
45
+ All contributions are merged on top of a Qwen3 base (see configuration below).
46
+
47
+ ---
48
+
49
+ ## πŸ› οΈ Merge Method & Configuration
50
+
51
+ The merge was performed using **[LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing)**, ensuring a harmonious integration of the different specializations.
52
 
53
+ ### Merge YAML (LazyMergekit)
54
+
55
+ ```yaml
56
  models:
57
  - model: janhq/Jan-v1-2509
58
  parameters:
 
83
 
84
  device: auto
85
  dtype: bfloat16
86
+ ```
87
+
88
+ ---
89
+
90
+ ## πŸ’» Usage (Transformers)
91
+
92
+ Install:
93
+
94
+ ```bash
95
+ pip install -U transformers accelerate torch
96
+ ```
97
 
98
+ Minimal chat example:
 
99
 
100
+ ```python
101
+ from transformers import AutoTokenizer, pipeline
102
  import torch
103
 
104
+ model_id = "rodrigomt/quem-v2-4b"
 
105
 
106
+ messages = [
107
+ {"role": "user", "content": "What is a large language model?"}
108
+ ]
109
+
110
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
111
+ prompt = tokenizer.apply_chat_template(
112
+ messages, tokenize=False, add_generation_prompt=True
113
+ )
114
+
115
+ pipe = pipeline(
116
  "text-generation",
117
+ model=model_id,
118
  torch_dtype=torch.float16,
119
  device_map="auto",
120
  )
121
 
122
+ out = pipe(
123
+ prompt,
124
+ max_new_tokens=256,
125
+ do_sample=True,
126
+ temperature=0.7,
127
+ top_k=50,
128
+ top_p=0.95,
129
+ )
130
+ print(out[0]["generated_text"])
131
+ ```
132
+
133
+ ### Prompting Tips
134
+
135
+ * Use standard **system / user / assistant** chat structure.
136
+ * For coding tasks, include concise requirements, desired language, and constraints.
137
+ * For math/logic tasks, allow slightly higher `max_new_tokens` and consider lower temperature (e.g., `temperature=0.3–0.5`) for more deterministic reasoning.
138
+
139
+ ---
140
+
141
+ ## βš™οΈ Inference Notes
142
+
143
+ * **Precision:** Default `bfloat16` (bf16); `float16` also works well on most GPUs.
144
+ * **Quantization:** 4-bit/8-bit quantization via `bitsandbytes` or `auto-gptq` can reduce memory; expect some quality trade-offs.
145
+ * **Decoding:**
146
+
147
+ * General chat: `temperature=0.7`, `top_p=0.9–0.95`, `max_new_tokens=256`.
148
+ * Code/Math: lower temperature (`0.2–0.5`), optionally increase `max_new_tokens` to 512–1024 for step-by-step reasoning.
149
+
150
+ ---
151
+
152
+ ## πŸ§ͺ Evaluation
153
+
154
+ No unified public benchmark is included in this release. Early local testing indicates improved step-by-step reasoning compared to the prior 4B merge on similar hardware, but results are **highly sensitive** to decoding parameters and prompts. Community PRs with reproducible evals (Arena/AlpacaEval/HELM/OpenLLM Leaderboards/LocalAIMe) are welcome.
155
+
156
+ ---
157
+
158
+ ## πŸ–₯️ System Requirements
159
+
160
+ **Minimum (single GPU):**
161
+
162
+ * RAM: 16 GB
163
+ * VRAM: 8 GB (e.g., RTX 3060 Ti / 3070 class)
164
+ * Storage: ~20 GB free
165
+ * CPU: Recent quad-core
166
+
167
+ **Recommended:**
168
+
169
+ * RAM: 32 GB
170
+ * VRAM: 12 GB+ (e.g., RTX 4070 / 3080 or higher)
171
+ * CPU: Modern multi-core
172
+
173
+ > Quantized weights can reduce VRAM but may affect quality.
174
+
175
+ ---
176
+
177
+ ## πŸ™Œ Acknowledgments
178
+
179
+ Thanks to the authors and communities behind **Jan**, **Lightning**, **Intel Hebrew Math Tutor**, **Qwen3 Code Reasoning**, and the **LazyMergekit** toolchain.
180
+
181
+ ## πŸ“ License
182
+
183
+ This model is licensed under the **Apache 2.0 License**.