jina-embeddings-v5-text-small-mlx

MLX multi-task checkpoint for jina-embeddings-v5-text-small on Apple Silicon. One base model with 4 LoRA adapters for dynamic task switching.

Elastic Inference Service | ArXiv | Blog

Installation

pip install mlx tokenizers huggingface_hub

Usage

from huggingface_hub import snapshot_download
from utils import load_model

model_dir = snapshot_download("jinaai/jina-embeddings-v5-text-small-mlx")
model = load_model(model_dir)

# Retrieval
model.switch_task("retrieval")
q_emb = model.encode(["What is machine learning?"], task_type="retrieval.query")
d_emb = model.encode(["Machine learning is a branch of AI."], task_type="retrieval.passage")

# Switch task (about 20ms, in-place, no extra memory)
model.switch_task("clustering")
emb = model.encode(["Group this document."])

How it works

The pre-merged MLX repos each contain a full copy of model weights for one task (about 1.1GB each, 4.5GB for all 4 tasks).

This repo stores the unmerged base model (1.1GB) plus 4 LoRA adapters (38MB each). At runtime, only one copy of weights lives in memory. Task switching merges and unmerges adapters in-place via matrix arithmetic in about 20ms with no additional memory allocation.

Total footprint: 1.3GB for all 4 tasks.

Tasks

Task	task_type	Description
retrieval	`retrieval.query` / `retrieval.passage`	Semantic search
text-matching	`text-matching`	Similarity comparison
clustering	`clustering`	Document grouping
classification	`classification`	Text classification

Matryoshka dimensions: 32, 64, 128, 256, 512, 768, 1024.

Precision

Cosine similarity > 0.999 vs pre-merged checkpoints. No accumulated drift after repeated task switching (tested 80 cycles).

Model Details

Architecture: Qwen3-0.6B base with task-specific LoRA adapters (r=32, alpha=32)
Precision: float16
Embedding dimension: 1024
Max sequence length: 32768 tokens
Optimized ops: mx.fast.scaled_dot_product_attention, mx.fast.rope

Citation

@article{mohr2025jina,
  title={Jina Embeddings v5: Universal Embeddings for Any Task, Length, and Language},
  author={Mohr, Isabella and Wang, Bo and G{\"u}nther, Michael and Sturua, Saba and Wang, Yanlong and Mastrapas, Georgios and Wang, Isabelle and Lange, Lennart and Xiao, Han},
  journal={arXiv preprint arXiv:2602.15547},
  year={2025}
}