Update README.md
Browse files
README.md
CHANGED
|
@@ -124,11 +124,19 @@ to implement production-ready inference pipelines.
|
|
| 124 |
|
| 125 |
**_Installation_**
|
| 126 |
|
| 127 |
-
Please make sure to
|
| 128 |
|
| 129 |
```
|
| 130 |
-
|
| 131 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
```
|
| 133 |
|
| 134 |
Alternatively, you can also install `vllm` from latest main by following instructions [here](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#python-only-build).
|
|
@@ -231,12 +239,6 @@ uv pip install git+https://github.com/huggingface/transformers
|
|
| 231 |
|
| 232 |
And run the following code snippet:
|
| 233 |
|
| 234 |
-
> [!Warning]
|
| 235 |
-
> While the checkpoint is serialized in FP8 format, there is currently a problem
|
| 236 |
-
> with "true" FP8 inference. Hence the weights are automatically dequantized to BFloat16
|
| 237 |
-
> as per [this config setting](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512/blob/main/config.json#L13).
|
| 238 |
-
> Once the bug is fixed, we will by default run the model in "true" FP8. Stay tuned by following [this issue](https://github.com/huggingface/transformers/issues/42746).
|
| 239 |
-
|
| 240 |
```python
|
| 241 |
import torch
|
| 242 |
from transformers import (
|
|
@@ -248,7 +250,6 @@ model_id = "mistralai/Devstral-Small-2-24B-Instruct-2512"
|
|
| 248 |
|
| 249 |
tokenizer = MistralCommonBackend.from_pretrained(model_id)
|
| 250 |
model = Mistral3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")
|
| 251 |
-
model = model.to(torch.bfloat16)
|
| 252 |
|
| 253 |
SP = """You are operating as and within Mistral Vibe, a CLI coding-agent built by Mistral AI and powered by default by the Devstral family of models. It wraps Mistral's Devstral models to enable natural language interaction with a local codebase. Use the available tools when helpful.
|
| 254 |
|
|
|
|
| 124 |
|
| 125 |
**_Installation_**
|
| 126 |
|
| 127 |
+
Please make sure to install vLLM nightly:
|
| 128 |
|
| 129 |
```
|
| 130 |
+
uv pip install -U vllm \
|
| 131 |
+
--torch-backend=auto \
|
| 132 |
+
--extra-index-url https://wheels.vllm.ai/nightly
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
Alternatively you can also directly use the nightly docker image [vllm/vllm-openai:nightly](https://hub.docker.com/layers/vllm/vllm-openai/nightly/images/sha256-a8cf9f2284a648074d6179e1d9caf74b3183536224bcf518fff73cc2b90dbc2f):
|
| 136 |
+
|
| 137 |
+
```
|
| 138 |
+
docker pull vllm/vllm-openai:nightly
|
| 139 |
+
docker run -it vllm/vllm-openai:nightly
|
| 140 |
```
|
| 141 |
|
| 142 |
Alternatively, you can also install `vllm` from latest main by following instructions [here](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#python-only-build).
|
|
|
|
| 239 |
|
| 240 |
And run the following code snippet:
|
| 241 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 242 |
```python
|
| 243 |
import torch
|
| 244 |
from transformers import (
|
|
|
|
| 250 |
|
| 251 |
tokenizer = MistralCommonBackend.from_pretrained(model_id)
|
| 252 |
model = Mistral3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")
|
|
|
|
| 253 |
|
| 254 |
SP = """You are operating as and within Mistral Vibe, a CLI coding-agent built by Mistral AI and powered by default by the Devstral family of models. It wraps Mistral's Devstral models to enable natural language interaction with a local codebase. Use the available tools when helpful.
|
| 255 |
|