liushaowei
commited on
Commit
·
579a3a3
1
Parent(s):
4308c42
update readme
Browse files
README.md
CHANGED
|
@@ -3,14 +3,14 @@ license: mit
|
|
| 3 |
library_name: transformers
|
| 4 |
---
|
| 5 |
<div align="center">
|
| 6 |
-
<a href="https://github.com/MoonshotAI/
|
| 7 |
</div>
|
| 8 |
|
| 9 |
<!-- # Muon is Scalable For LLM Training -->
|
| 10 |
|
| 11 |
<div align="center">
|
| 12 |
-
<a href="https://github.com/MoonshotAI/
|
| 13 |
-
<a href="https://huggingface.co/moonshotai/Moonlight"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> HuggingFace</b></a> |
|
| 14 |
<a href="#"><img src="figures/megatron.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;">Megatron(coming soon)</b></a>
|
| 15 |
</div>
|
| 16 |
|
|
@@ -85,8 +85,8 @@ We compared Moonlight with SOTA public models at similar scale:
|
|
| 85 |
|
| 86 |
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
|
| 87 |
| :------------: | :------------: | :------------: | :------------: | :------------: |
|
| 88 |
-
| Moonlight | 16B | 3B | 8K | [🤗 Hugging Face](https://huggingface.co/moonshotai/Moonlight) |
|
| 89 |
-
| Moonlight-Instruct | 16B | 3B | 8K | [🤗 Hugging Face](https://huggingface.co/moonshotai/Moonlight-Instruct) |
|
| 90 |
|
| 91 |
</div>
|
| 92 |
|
|
@@ -94,7 +94,7 @@ We compared Moonlight with SOTA public models at similar scale:
|
|
| 94 |
|
| 95 |
We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and the latest version of transformers as the development environment.
|
| 96 |
|
| 97 |
-
For our pretrained model (Moonlight):
|
| 98 |
```python
|
| 99 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 100 |
|
|
@@ -113,7 +113,7 @@ generated_ids = model.generate(**inputs, max_new_tokens=100)
|
|
| 113 |
response = tokenizer.batch_decode(generated_ids)[0]
|
| 114 |
```
|
| 115 |
|
| 116 |
-
For our instruct model (Moonlight-Instruct):
|
| 117 |
|
| 118 |
```python
|
| 119 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
| 3 |
library_name: transformers
|
| 4 |
---
|
| 5 |
<div align="center">
|
| 6 |
+
<a href="https://github.com/MoonshotAI/Moonlight"><img width="80%" src="figures/banner.png"></a>
|
| 7 |
</div>
|
| 8 |
|
| 9 |
<!-- # Muon is Scalable For LLM Training -->
|
| 10 |
|
| 11 |
<div align="center">
|
| 12 |
+
<a href="https://github.com/MoonshotAI/Moonlight/blob/master/Moonlight.pdf" ><img src="figures/logo.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> Tech Report</b></a> |
|
| 13 |
+
<a href="https://huggingface.co/moonshotai/Moonlight-16B-A3B"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> HuggingFace</b></a> |
|
| 14 |
<a href="#"><img src="figures/megatron.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;">Megatron(coming soon)</b></a>
|
| 15 |
</div>
|
| 16 |
|
|
|
|
| 85 |
|
| 86 |
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
|
| 87 |
| :------------: | :------------: | :------------: | :------------: | :------------: |
|
| 88 |
+
| Moonlight-16B-A3B | 16B | 3B | 8K | [🤗 Hugging Face](https://huggingface.co/moonshotai/Moonlight-16B-A3B) |
|
| 89 |
+
| Moonlight-16B-A3B-Instruct | 16B | 3B | 8K | [🤗 Hugging Face](https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct) |
|
| 90 |
|
| 91 |
</div>
|
| 92 |
|
|
|
|
| 94 |
|
| 95 |
We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and the latest version of transformers as the development environment.
|
| 96 |
|
| 97 |
+
For our pretrained model (Moonlight-16B-A3B):
|
| 98 |
```python
|
| 99 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 100 |
|
|
|
|
| 113 |
response = tokenizer.batch_decode(generated_ids)[0]
|
| 114 |
```
|
| 115 |
|
| 116 |
+
For our instruct model (Moonlight-16B-A3B-Instruct):
|
| 117 |
|
| 118 |
```python
|
| 119 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|