mistralai
/

Devstral-Small-2-24B-Instruct-2512

Model card Files Files and versions

patrickvonplaten commited on 15 days ago

Commit

0f7f0bc

·

verified ·

1 Parent(s): 8d27a0d

Update README.md

Files changed (1) hide show

README.md +11 -10

README.md CHANGED Viewed

@@ -124,11 +124,19 @@ to implement production-ready inference pipelines.
 **_Installation_**
-Please make sure to use our custom vLLM docker image [mistralllm/vllm_devstral:latest](https://hub.docker.com/repository/docker/mistralllm/vllm_devstral/tags/latest/sha256:d2ca883e8b4e0bec7d6953706410d2741e88ade6e07e576a51756f4bf51a0ffd):
 ```
-docker pull mistralllm/vllm_devstral:latest
-docker run -it mistralllm/vllm_devstral:latest
 ```
 Alternatively, you can also install `vllm` from latest main by following instructions [here](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#python-only-build).
@@ -231,12 +239,6 @@ uv pip install git+https://github.com/huggingface/transformers
 And run the following code snippet:
-> [!Warning]
-> While the checkpoint is serialized in FP8 format, there is currently a problem
-> with "true" FP8 inference. Hence the weights are automatically dequantized to BFloat16
-> as per [this config setting](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512/blob/main/config.json#L13).
-> Once the bug is fixed, we will by default run the model in "true" FP8. Stay tuned by following [this issue](https://github.com/huggingface/transformers/issues/42746).
 ```python
 import torch
 from transformers import (
@@ -248,7 +250,6 @@ model_id = "mistralai/Devstral-Small-2-24B-Instruct-2512"
 tokenizer = MistralCommonBackend.from_pretrained(model_id)
 model = Mistral3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")
-model = model.to(torch.bfloat16)
 SP = """You are operating as and within Mistral Vibe, a CLI coding-agent built by Mistral AI and powered by default by the Devstral family of models. It wraps Mistral's Devstral models to enable natural language interaction with a local codebase. Use the available tools when helpful.

 **_Installation_**
+Please make sure to install vLLM nightly:
 ```
+uv pip install -U vllm \
+    --torch-backend=auto \
+    --extra-index-url https://wheels.vllm.ai/nightly
+```
+Alternatively you can also directly use the nightly docker image [vllm/vllm-openai:nightly](https://hub.docker.com/layers/vllm/vllm-openai/nightly/images/sha256-a8cf9f2284a648074d6179e1d9caf74b3183536224bcf518fff73cc2b90dbc2f):
+```
+docker pull vllm/vllm-openai:nightly
+docker run -it vllm/vllm-openai:nightly
 ```
 Alternatively, you can also install `vllm` from latest main by following instructions [here](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#python-only-build).
 And run the following code snippet:
 ```python
 import torch
 from transformers import (
 tokenizer = MistralCommonBackend.from_pretrained(model_id)
 model = Mistral3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")
 SP = """You are operating as and within Mistral Vibe, a CLI coding-agent built by Mistral AI and powered by default by the Devstral family of models. It wraps Mistral's Devstral models to enable natural language interaction with a local codebase. Use the available tools when helpful.