nvidia
/

Nemotron-Flash-3B-Instruct

Text Generation

Model card Files Files and versions

YongganFu commited on Oct 25

Commit

0750f7e

·

verified ·

1 Parent(s): 426f2fa

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -3,14 +3,14 @@ library_name: transformers
 tags: []
 ---
-# Nemotron-Hymba2-3B-Instruct
-Nemotron-Hymba2 is a new hybrid SLM model family that outperforms Qwen models in accuracy (math, coding, and commonsense), batch-size-1 latency, and throughput. More details are in our NeurIPS 2025 [paper](https://drive.google.com/drive/folders/17vOGktwUfUpRAJPGJUV6oX8XwLSczZtv?usp=sharing).
 Docker path: `/lustre/fsw/portfolios/nvr/users/yongganf/docker/megatron_py25_fast_slm.sqsh` on NRT.
-## Chat with Nemotron-Hymba2-3B-Instruct
 We wrap the model into CUDA Graph for fast generation:
@@ -18,7 +18,7 @@ We wrap the model into CUDA Graph for fast generation:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
-repo_name = "nvidia/Nemotron-Hymba2-3B-Instruct"
 tokenizer = AutoTokenizer.from_pretrained(repo_name, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(repo_name, trust_remote_code=True)

 tags: []
 ---
+# Nemotron-Flash-3B-Instruct
+Nemotron-Flash is a new hybrid SLM model family that outperforms Qwen models in accuracy (math, coding, and commonsense), batch-size-1 latency, and throughput. More details are in our NeurIPS 2025 [paper](https://drive.google.com/drive/folders/17vOGktwUfUpRAJPGJUV6oX8XwLSczZtv?usp=sharing).
 Docker path: `/lustre/fsw/portfolios/nvr/users/yongganf/docker/megatron_py25_fast_slm.sqsh` on NRT.
+## Chat with Nemotron-Flash-3B-Instruct
 We wrap the model into CUDA Graph for fast generation:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
+repo_name = "nvidia/Nemotron-Flash-3B-Instruct"
 tokenizer = AutoTokenizer.from_pretrained(repo_name, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(repo_name, trust_remote_code=True)