patrickvonplaten commited on
Commit
0f7f0bc
·
verified ·
1 Parent(s): 8d27a0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -10
README.md CHANGED
@@ -124,11 +124,19 @@ to implement production-ready inference pipelines.
124
 
125
  **_Installation_**
126
 
127
- Please make sure to use our custom vLLM docker image [mistralllm/vllm_devstral:latest](https://hub.docker.com/repository/docker/mistralllm/vllm_devstral/tags/latest/sha256:d2ca883e8b4e0bec7d6953706410d2741e88ade6e07e576a51756f4bf51a0ffd):
128
 
129
  ```
130
- docker pull mistralllm/vllm_devstral:latest
131
- docker run -it mistralllm/vllm_devstral:latest
 
 
 
 
 
 
 
 
132
  ```
133
 
134
  Alternatively, you can also install `vllm` from latest main by following instructions [here](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#python-only-build).
@@ -231,12 +239,6 @@ uv pip install git+https://github.com/huggingface/transformers
231
 
232
  And run the following code snippet:
233
 
234
- > [!Warning]
235
- > While the checkpoint is serialized in FP8 format, there is currently a problem
236
- > with "true" FP8 inference. Hence the weights are automatically dequantized to BFloat16
237
- > as per [this config setting](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512/blob/main/config.json#L13).
238
- > Once the bug is fixed, we will by default run the model in "true" FP8. Stay tuned by following [this issue](https://github.com/huggingface/transformers/issues/42746).
239
-
240
  ```python
241
  import torch
242
  from transformers import (
@@ -248,7 +250,6 @@ model_id = "mistralai/Devstral-Small-2-24B-Instruct-2512"
248
 
249
  tokenizer = MistralCommonBackend.from_pretrained(model_id)
250
  model = Mistral3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")
251
- model = model.to(torch.bfloat16)
252
 
253
  SP = """You are operating as and within Mistral Vibe, a CLI coding-agent built by Mistral AI and powered by default by the Devstral family of models. It wraps Mistral's Devstral models to enable natural language interaction with a local codebase. Use the available tools when helpful.
254
 
 
124
 
125
  **_Installation_**
126
 
127
+ Please make sure to install vLLM nightly:
128
 
129
  ```
130
+ uv pip install -U vllm \
131
+ --torch-backend=auto \
132
+ --extra-index-url https://wheels.vllm.ai/nightly
133
+ ```
134
+
135
+ Alternatively you can also directly use the nightly docker image [vllm/vllm-openai:nightly](https://hub.docker.com/layers/vllm/vllm-openai/nightly/images/sha256-a8cf9f2284a648074d6179e1d9caf74b3183536224bcf518fff73cc2b90dbc2f):
136
+
137
+ ```
138
+ docker pull vllm/vllm-openai:nightly
139
+ docker run -it vllm/vllm-openai:nightly
140
  ```
141
 
142
  Alternatively, you can also install `vllm` from latest main by following instructions [here](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#python-only-build).
 
239
 
240
  And run the following code snippet:
241
 
 
 
 
 
 
 
242
  ```python
243
  import torch
244
  from transformers import (
 
250
 
251
  tokenizer = MistralCommonBackend.from_pretrained(model_id)
252
  model = Mistral3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")
 
253
 
254
  SP = """You are operating as and within Mistral Vibe, a CLI coding-agent built by Mistral AI and powered by default by the Devstral family of models. It wraps Mistral's Devstral models to enable natural language interaction with a local codebase. Use the available tools when helpful.
255