jina-embeddings-v5-text-small-mlx
MLX multi-task checkpoint for jina-embeddings-v5-text-small on Apple Silicon. One base model with 4 LoRA adapters for dynamic task switching.
Elastic Inference Service | ArXiv | Blog
Installation
pip install mlx tokenizers huggingface_hub
Usage
from huggingface_hub import snapshot_download
from utils import load_model
model_dir = snapshot_download("jinaai/jina-embeddings-v5-text-small-mlx")
model = load_model(model_dir)
# Retrieval
model.switch_task("retrieval")
q_emb = model.encode(["What is machine learning?"], task_type="retrieval.query")
d_emb = model.encode(["Machine learning is a branch of AI."], task_type="retrieval.passage")
# Switch task (about 20ms, in-place, no extra memory)
model.switch_task("clustering")
emb = model.encode(["Group this document."])
How it works
The pre-merged MLX repos each contain a full copy of model weights for one task (about 1.1GB each, 4.5GB for all 4 tasks).
This repo stores the unmerged base model (1.1GB) plus 4 LoRA adapters (38MB each). At runtime, only one copy of weights lives in memory. Task switching merges and unmerges adapters in-place via matrix arithmetic in about 20ms with no additional memory allocation.
Total footprint: 1.3GB for all 4 tasks.
Tasks
| Task | task_type | Description |
|---|---|---|
| retrieval | retrieval.query / retrieval.passage |
Semantic search |
| text-matching | text-matching |
Similarity comparison |
| clustering | clustering |
Document grouping |
| classification | classification |
Text classification |
Matryoshka dimensions: 32, 64, 128, 256, 512, 768, 1024.
Precision
Cosine similarity > 0.999 vs pre-merged checkpoints. No accumulated drift after repeated task switching (tested 80 cycles).
Model Details
- Architecture: Qwen3-0.6B base with task-specific LoRA adapters (r=32, alpha=32)
- Precision: float16
- Embedding dimension: 1024
- Max sequence length: 32768 tokens
- Optimized ops:
mx.fast.scaled_dot_product_attention,mx.fast.rope
Citation
@article{mohr2025jina,
title={Jina Embeddings v5: Universal Embeddings for Any Task, Length, and Language},
author={Mohr, Isabella and Wang, Bo and G{\"u}nther, Michael and Sturua, Saba and Wang, Yanlong and Mastrapas, Georgios and Wang, Isabelle and Lange, Lennart and Xiao, Han},
journal={arXiv preprint arXiv:2602.15547},
year={2025}
}
- Downloads last month
- 125
Quantized
Model tree for jinaai/jina-embeddings-v5-text-small-mlx
Base model
Qwen/Qwen3-0.6B-Base