Qwen2.5-VL-3B-Instruct: Optimized for SiMa.ai Modalix

Overview

This repository contains the Qwen2.5-VL-3B-Instruct model, optimized and compiled for the SiMa.ai Modalix platform.

  • Model Architecture: Qwen2.5-VL (3B parameters)
  • Quantization: INT4 (A16W4)
    • Prompt Processing: A16W4 (16-bit activations, 4-bit weights)
    • Token Generation: A16W4 (16-bit activations, 4-bit weights)
  • Maximum context length: 2048
  • Input Resolution: 448x448 (Fixed)
  • Source Model: Qwen/Qwen2.5-VL-3B-Instruct

Performance

The following performance metrics were measured with an image and a text prompt of 50 tokens.

Model Precision Device Response Rate (tokens/sec) Time To First Token (sec)
Qwen2.5-VL-3B-Instruct A16W4 Modalix 19.1 tokens/sec 0.91 sec

Prerequisites

To run this model, you need:

  1. SiMa.ai Modalix Device
  2. SiMa.ai CLI: Installed on your Modalix device.
  3. Hugging Face CLI: For downloading the model.

Installation & Deployment

Follow these steps to deploy the model to your Modalix device.

1. Install LLiMa Demo Application

โš ๏ธ Critical Requirement: This model requires the LLiMa Beta runtime version 2.0.1. You must install the beta version (-t beta) even if you have a previous version of LLiMa installed. Standard releases (e.g., v2.0.0) are not compatible with this model.

On your Modalix device, install the LLiMa beta runtime using the sima-cli:

# Create a directory for LLiMa
cd /media/nvme
mkdir -p llima
cd llima

# Install the LLiMa Beta runtime code
sima-cli install samples/llima -t beta

Note: To only download the LLiMa runtime code, select ๐Ÿšซ Skip when prompted.

2. Download the Model

Download the compiled model assets from this repository directly to your device.

# Download the model to a local directory
cd /media/nvme/llima
hf download simaai/Qwen2.5-VL-3B-Instruct-a16w4 --local-dir Qwen2.5-VL-3B-Instruct-a16w4

Alternatively, you can download the compiled model to a Host and copy it to the Modalix device:

hf download simaai/Qwen2.5-VL-3B-Instruct-a16w4 --local-dir Qwen2.5-VL-3B-Instruct-a16w4
scp -r Qwen2.5-VL-3B-Instruct-a16w4 sima@<modalix-ip>:/media/nvme/llima/

Replace <modalix-ip> with the IP address of your Modalix device.

Expected Directory Structure:

/media/nvme/llima/
โ”œโ”€โ”€ simaai-genai-demo/   # The demo app
โ””โ”€โ”€ Qwen2.5-VL-3B-Instruct/        # Your downloaded model

Usage

Run the Application

Navigate to the demo directory and start the application:

cd /media/nvme/llima/simaai-genai-demo
./run.sh

The script will detect the installed model(s) and prompt you to select one.

Once the application is running, open a browser and navigate to:

https://<modalix-ip>:5000/

Replace <modalix-ip> with the IP address of your Modalix device.

API Usage

To use OpenAI-compatible API, run the model in API mode:

cd /media/nvme/llima/simaai-genai-demo
./run.sh --httponly --api-only

You can interact with it using curl or Python.

Example: Chat Completion

# Note: You need to replace <YOUR_BASE64_STRING_HERE> with an actual base64 encoded image string.
curl -N -k -X POST "https://<modalix-ip>:5000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/jpeg;base64,<YOUR_BASE64_STRING_HERE>"
            }
          },
          {
            "type": "text",
            "text": "Describe this image"
          }
        ]
      }
    ],
    "stream": true
  }'

Replace <modalix-ip> with the IP address of your Modalix device.

Limitations

  • Quantization: This model is quantized (A16W4) for optimal performance on embedded devices. While this maintains high accuracy, minor deviations from the full-precision model may occur.
  • Fixed Resolution: While the standard Qwen2.5-VL architecture supports dynamic input resolutions, this version has been specifically optimized and fixed to 448x448 resolution at compile time to achieve maximum throughput and efficiency on the SiMa.ai MLA.

Troubleshooting

  • sima-cli not found: Ensure that sima-cli is installed on your Modalix device.
  • Model can't be run: Verify the model directory is exactly inside /media/nvme/llima/ and not nested (e.g., /media/nvme/llima/Qwen2.5-VL-3B-Instruct/Qwen2.5-VL-3B-Instruct/).
  • Permission Denied: Ensure you have read/write permissions for the /media/nvme directory.

Resources

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for simaai/Qwen2.5-VL-3B-Instruct-a16w4

Finetuned
(643)
this model