ColQwen3 8B - VetCoders MLX Edition
Visual document retrieval model with ColBERT-style late interaction (MaxSim scoring), optimized for Apple Silicon via MLX.
Created by M&K (c)2025 The LibraxisAI Team
Model Description
ColQwen3-VetCoders-MLX is a visual document retrieval model converted to Apple MLX format. It produces multi-vector embeddings for both document images and text queries, enabling precise visual document search using late interaction (MaxSim) scoring.
Key Features
- Visual Document Retrieval - Find relevant pages in PDF documents using image understanding
- Late Interaction Ranking - ColBERT-style MaxSim scoring for precision
- Multi-modal Embeddings - Embed both images and text queries into shared 320-dim space
- Apple Silicon Native - Optimized for M1/M2/M3/M4 via MLX framework
Architecture
Input (Image or Text)
β
βββββββββββββββββββββββββββββββ
β Qwen3-VL Vision Encoder β β For images: extract visual features
β (frozen ViT patches) β
βββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββ
β Qwen3 Language Model β β Multimodal token processing
β (7B parameters) β
βββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββ
β Projection Layer β β Project to 320-dim embedding space
β (4096 β 320) β
βββββββββββββββββββββββββββββββ
β
Multi-vector embeddings [N, 320]
Usage
Installation
pip install mlx mlx-vlm safetensors pillow
Loading the Model
from colqwen3_embedder import ColQwen3Embedder
# Initialize embedder (uses env vars or default paths)
embedder = ColQwen3Embedder()
embedder.load()
# Or specify paths directly
embedder = ColQwen3Embedder(
model_path="LibraxisAI/colqwen3-8b-vetcoders-mlx",
projection_path="path/to/projection.safetensors"
)
Embedding Documents
# Embed a document image
doc_embedding = embedder.embed_image("document_page.png")
# Returns: EmbeddingResult with shape [num_patches, 320]
# Embed a text query
query_embedding = embedder.embed_text("What is the treatment protocol?")
# Returns: EmbeddingResult with shape [num_tokens, 320]
Scoring Relevance
# MaxSim scoring for retrieval
score = embedder.maxsim_score(query_embedding, doc_embedding)
# Higher score = more relevant document
Batch Processing
# Process multiple documents
documents = ["page1.png", "page2.png", "page3.png"]
doc_embeddings = [embedder.embed_image(doc) for doc in documents]
# Score all documents against query
scores = [embedder.maxsim_score(query_embedding, doc) for doc in doc_embeddings]
# Get top matches
ranked = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)
Technical Details
Base Model
Converted from tomoro-ai/Colqwen3-8B-base, which was trained on:
- vidore/colpali_train_set
- Additional document understanding datasets
Weight Mapping
Original Tomoro weights are mapped to MLX-compatible structure:
vlm.model.language_model.*βlanguage_model.model.*vlm.model.visual.*βvision_tower.*embedding_proj_layer.*β saved separately as projection weights
Embedding Details
- Dimension: 320 (projected from 4096)
- Image tokens: Variable based on image resolution (patches)
- Text tokens: Variable based on query length
- Scoring: MaxSim (maximum similarity) late interaction
Performance
Tested on Apple Silicon:
| Device | Image Embedding | Text Embedding | Memory |
|---|---|---|---|
| M3 Max 128GB | ~1.2s | ~0.3s | ~17GB |
| M3 Ultra 512GB | ~0.8s | ~0.2s | ~17GB |
| M2 Ultra 192GB | ~1.5s | ~0.4s | ~17GB |
Files
colqwen3-8b-vetcoders-mlx/
βββ config.json # Model configuration
βββ model-00001-of-00007.safetensors # Model weights (sharded)
βββ model-00002-of-00007.safetensors
βββ ...
βββ model.safetensors.index.json # Weight index
βββ tokenizer.json # Tokenizer
βββ tokenizer_config.json
βββ preprocessor_config.json # Image preprocessor
βββ video_preprocessor_config.json
Projection weights (separate file):
colqwen3_projection.safetensors # 4096β320 projection layer
Limitations
- Requires Apple Silicon (M1/M2/M3/M4) for MLX acceleration
- Large memory footprint (~17GB for inference)
- Optimized for document images, not general photos
Citation
@misc{colqwen3-vetcoders-mlx,
author = {LibraxisAI Team},
title = {ColQwen3 8B - VetCoders MLX Edition},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/LibraxisAI/colqwen3-8b-vetcoders-mlx}}
}
License
Apache 2.0
Created by M&K (c)2025 The LibraxisAI Team Co-Authored-By: Maciej & Klaudiusz
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support