Qwen2.5-VL-3B-Instruct: Optimized for SiMa.ai Modalix
Overview
This repository contains the Qwen2.5-VL-3B-Instruct model, optimized and compiled for the SiMa.ai Modalix platform.
- Model Architecture: Qwen2.5-VL (3B parameters)
- Quantization: INT4 (A16W4)
- Prompt Processing: A16W4 (16-bit activations, 4-bit weights)
- Token Generation: A16W4 (16-bit activations, 4-bit weights)
- Maximum context length: 2048
- Input Resolution: 448x448 (Fixed)
- Source Model: Qwen/Qwen2.5-VL-3B-Instruct
Performance
The following performance metrics were measured with an image and a text prompt of 50 tokens.
| Model | Precision | Device | Response Rate (tokens/sec) | Time To First Token (sec) |
|---|---|---|---|---|
| Qwen2.5-VL-3B-Instruct | A16W4 | Modalix | 19.1 tokens/sec | 0.91 sec |
Prerequisites
To run this model, you need:
- SiMa.ai Modalix Device
- SiMa.ai CLI: Installed on your Modalix device.
- Hugging Face CLI: For downloading the model.
Installation & Deployment
Follow these steps to deploy the model to your Modalix device.
1. Install LLiMa Demo Application
โ ๏ธ Critical Requirement: This model requires the LLiMa Beta runtime version 2.0.1. You must install the beta version (
-t beta) even if you have a previous version of LLiMa installed. Standard releases (e.g., v2.0.0) are not compatible with this model.
On your Modalix device, install the LLiMa beta runtime using the sima-cli:
# Create a directory for LLiMa
cd /media/nvme
mkdir -p llima
cd llima
# Install the LLiMa Beta runtime code
sima-cli install samples/llima -t beta
Note: To only download the LLiMa runtime code, select ๐ซ Skip when prompted.
2. Download the Model
Download the compiled model assets from this repository directly to your device.
# Download the model to a local directory
cd /media/nvme/llima
hf download simaai/Qwen2.5-VL-3B-Instruct-a16w4 --local-dir Qwen2.5-VL-3B-Instruct-a16w4
Alternatively, you can download the compiled model to a Host and copy it to the Modalix device:
hf download simaai/Qwen2.5-VL-3B-Instruct-a16w4 --local-dir Qwen2.5-VL-3B-Instruct-a16w4
scp -r Qwen2.5-VL-3B-Instruct-a16w4 sima@<modalix-ip>:/media/nvme/llima/
Replace <modalix-ip> with the IP address of your Modalix device.
Expected Directory Structure:
/media/nvme/llima/
โโโ simaai-genai-demo/ # The demo app
โโโ Qwen2.5-VL-3B-Instruct/ # Your downloaded model
Usage
Run the Application
Navigate to the demo directory and start the application:
cd /media/nvme/llima/simaai-genai-demo
./run.sh
The script will detect the installed model(s) and prompt you to select one.
Once the application is running, open a browser and navigate to:
https://<modalix-ip>:5000/
Replace <modalix-ip> with the IP address of your Modalix device.
API Usage
To use OpenAI-compatible API, run the model in API mode:
cd /media/nvme/llima/simaai-genai-demo
./run.sh --httponly --api-only
You can interact with it using curl or Python.
Example: Chat Completion
# Note: You need to replace <YOUR_BASE64_STRING_HERE> with an actual base64 encoded image string.
curl -N -k -X POST "https://<modalix-ip>:5000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,<YOUR_BASE64_STRING_HERE>"
}
},
{
"type": "text",
"text": "Describe this image"
}
]
}
],
"stream": true
}'
Replace <modalix-ip> with the IP address of your Modalix device.
Limitations
- Quantization: This model is quantized (A16W4) for optimal performance on embedded devices. While this maintains high accuracy, minor deviations from the full-precision model may occur.
- Fixed Resolution: While the standard Qwen2.5-VL architecture supports dynamic input resolutions, this version has been specifically optimized and fixed to 448x448 resolution at compile time to achieve maximum throughput and efficiency on the SiMa.ai MLA.
Troubleshooting
sima-clinot found: Ensure that sima-cli is installed on your Modalix device.- Model can't be run: Verify the model directory is exactly inside
/media/nvme/llima/and not nested (e.g.,/media/nvme/llima/Qwen2.5-VL-3B-Instruct/Qwen2.5-VL-3B-Instruct/). - Permission Denied: Ensure you have read/write permissions for the
/media/nvmedirectory.
Resources
Model tree for simaai/Qwen2.5-VL-3B-Instruct-a16w4
Base model
Qwen/Qwen2.5-VL-3B-Instruct