Add link to tech report. Fix typo in usage example #2
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ license_link: >-
|
|
| 10 |
|
| 11 |
Llama-3.1-Minitron-4B-Width-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks.
|
| 12 |
It is obtained by pruning Llama-3.1-8B; specifically, we prune model embedding size and MLP intermediate dimension.
|
| 13 |
-
Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
|
| 14 |
|
| 15 |
This model is ready for commercial use.
|
| 16 |
|
|
@@ -59,7 +59,7 @@ import torch
|
|
| 59 |
from transformers import AutoTokenizer, LlamaForCausalLM
|
| 60 |
|
| 61 |
# Load the tokenizer and model
|
| 62 |
-
model_path = "nvidia/
|
| 63 |
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 64 |
|
| 65 |
device = 'cuda'
|
|
@@ -143,4 +143,6 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
|
|
| 143 |
Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
|
| 144 |
|
| 145 |
## References
|
| 146 |
-
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
Llama-3.1-Minitron-4B-Width-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks.
|
| 12 |
It is obtained by pruning Llama-3.1-8B; specifically, we prune model embedding size and MLP intermediate dimension.
|
| 13 |
+
Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose. Please refer to our [technical report](https://arxiv.org/abs/2408.11796) for more details.
|
| 14 |
|
| 15 |
This model is ready for commercial use.
|
| 16 |
|
|
|
|
| 59 |
from transformers import AutoTokenizer, LlamaForCausalLM
|
| 60 |
|
| 61 |
# Load the tokenizer and model
|
| 62 |
+
model_path = "nvidia/Llama-3.1-Minitron-4B-Width-Base"
|
| 63 |
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 64 |
|
| 65 |
device = 'cuda'
|
|
|
|
| 143 |
Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
|
| 144 |
|
| 145 |
## References
|
| 146 |
+
|
| 147 |
+
* [Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
|
| 148 |
+
* [LLM Pruning and Distillation in Practice: The Minitron Approach](https://arxiv.org/abs/2408.11796)
|