SentenceTransformer based on jinaai/jina-embeddings-v3

This is a sentence-transformers model finetuned from jinaai/jina-embeddings-v3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: jinaai/jina-embeddings-v3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (transformer): Transformer(
    (auto_model): PeftModelForFeatureExtraction(
      (base_model): LoraModel(
        (model): XLMRobertaLoRA(
          (roberta): XLMRobertaModel(
            (embeddings): XLMRobertaEmbeddings(
              (word_embeddings): ParametrizedEmbedding(
                250002, 1024, padding_idx=1
                (parametrizations): ModuleDict(
                  (weight): ParametrizationList(
                    (0): LoRAParametrization()
                  )
                )
              )
              (token_type_embeddings): ParametrizedEmbedding(
                1, 1024
                (parametrizations): ModuleDict(
                  (weight): ParametrizationList(
                    (0): LoRAParametrization()
                  )
                )
              )
            )
            (emb_drop): Dropout(p=0.1, inplace=False)
            (emb_ln): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (encoder): XLMRobertaEncoder(
              (layers): ModuleList(
                (0-23): 24 x Block(
                  (mixer): MHA(
                    (rotary_emb): RotaryEmbedding()
                    (Wqkv): ParametrizedLinearResidual(
                      in_features=1024, out_features=3072, bias=True
                      (parametrizations): ModuleDict(
                        (weight): ParametrizationList(
                          (0): LoRAParametrization()
                        )
                      )
                    )
                    (inner_attn): SelfAttention(
                      (drop): Dropout(p=0.1, inplace=False)
                    )
                    (inner_cross_attn): CrossAttention(
                      (drop): Dropout(p=0.1, inplace=False)
                    )
                    (out_proj): lora.Linear(
                      (base_layer): ParametrizedLinear(
                        in_features=1024, out_features=1024, bias=True
                        (parametrizations): ModuleDict(
                          (weight): ParametrizationList(
                            (0): LoRAParametrization()
                          )
                        )
                      )
                      (lora_dropout): ModuleDict(
                        (default): Dropout(p=0.1, inplace=False)
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=1024, out_features=32, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=32, out_features=1024, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                      (lora_magnitude_vector): ModuleDict()
                    )
                  )
                  (dropout1): Dropout(p=0.1, inplace=False)
                  (drop_path1): StochasticDepth(p=0.0, mode=row)
                  (norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
                  (mlp): Mlp(
                    (fc1): lora.Linear(
                      (base_layer): ParametrizedLinear(
                        in_features=1024, out_features=4096, bias=True
                        (parametrizations): ModuleDict(
                          (weight): ParametrizationList(
                            (0): LoRAParametrization()
                          )
                        )
                      )
                      (lora_dropout): ModuleDict(
                        (default): Dropout(p=0.1, inplace=False)
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=1024, out_features=32, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=32, out_features=4096, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                      (lora_magnitude_vector): ModuleDict()
                    )
                    (fc2): lora.Linear(
                      (base_layer): ParametrizedLinear(
                        in_features=4096, out_features=1024, bias=True
                        (parametrizations): ModuleDict(
                          (weight): ParametrizationList(
                            (0): LoRAParametrization()
                          )
                        )
                      )
                      (lora_dropout): ModuleDict(
                        (default): Dropout(p=0.1, inplace=False)
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=4096, out_features=32, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=32, out_features=1024, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                      (lora_magnitude_vector): ModuleDict()
                    )
                  )
                  (dropout2): Dropout(p=0.1, inplace=False)
                  (drop_path2): StochasticDepth(p=0.0, mode=row)
                  (norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
                )
              )
            )
            (pooler): XLMRobertaPooler(
              (dense): ParametrizedLinear(
                in_features=1024, out_features=1024, bias=True
                (parametrizations): ModuleDict(
                  (weight): ParametrizationList(
                    (0): LoRAParametrization()
                  )
                )
              )
              (activation): Tanh()
            )
          )
        )
      )
    )
  )
  (pooler): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (normalizer): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Mercity/memory-retrieval-jina-v3-lora")
# Run inference
sentences = [
    'Preparing instructions for potential Brazilian yoga classes excites me—could you curate a professional list of Portuguese phrases for guiding poses and breathing exercises?',
    'The previous attempt at self-study failed because Liam found the standard textbook pronunciation guide recordings to be grating and overly formal, leading him to stop practicing after two weeks.',
    'Chloe has a documented, severe anxiety disorder requiring her to maintain a structured, predictable routine; sudden, high-stress financial calculations or immediate high-stakes decisions trigger significant health setbacks.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.8699, -0.1061],
#         [ 0.8699,  1.0000, -0.1572],
#         [-0.1061, -0.1572,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 369,891 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 12 tokens
    • mean: 36.09 tokens
    • max: 81 tokens
    • min: 20 tokens
    • mean: 40.88 tokens
    • max: 78 tokens
    • min: 15 tokens
    • mean: 36.83 tokens
    • max: 68 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    To achieve sufficient relaxation by 11 PM after a demanding shift, suggest budget-conscious, non-stimulating pursuits that differ from audiobooks and suit my solo living situation. Alex has been working on mastering the art of traditional ink drawing (Sumi-e) as a meditative hobby, which requires minimal light and focus. Maria has set a personal milestone to donate 10% of her memoir's first-year royalties to a burnout recovery nonprofit, tying her publication success directly to the book's perceived authenticity and impact.
    I'm so pumped about this new grammar series—it's going to make such a difference for my subscribers who keep mixing up noun genders! Can you brainstorm ways to animate those common pitfalls like the -o ending myth? The beta group overwhelmingly preferred short, character-driven skits over abstract quizzes, specifically mentioning that the last tutorial that relied heavily on on-screen text overlays resulted in lower engagement. Alex previously boosted his geometry understanding on the SAT by reviewing sample test questions daily during short 30-minute sessions after school.
    Jamal pushes safe bets, yet deadline looms like a storm—verify this claim? Maria received an internal promotion review last week, and exceeding expectations on this presentation is the single biggest factor determining her eligibility for the Senior Manager role opening in January. Jamal is currently bogged down trying to reconcile conflicting Q3 sales data from three different regional offices, which he finds deeply frustrating.
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 0.5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 1
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0433 500 0.2143
0.0865 1000 0.1182

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.11.0
  • Datasets: 4.4.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mercity/memory-retrieval-jina-v3-lora

Finetuned
(29)
this model