SentenceTransformer based on jinaai/jina-embeddings-v3

This is a sentence-transformers model finetuned from jinaai/jina-embeddings-v3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: jinaai/jina-embeddings-v3
  • Maximum Sequence Length: 8194 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (transformer): Transformer(
    (auto_model): XLMRobertaLoRA(
      (roberta): XLMRobertaModel(
        (embeddings): XLMRobertaEmbeddings(
          (word_embeddings): ParametrizedEmbedding(
            250002, 1024, padding_idx=1
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
          (token_type_embeddings): ParametrizedEmbedding(
            1, 1024
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
        )
        (emb_drop): Dropout(p=0.1, inplace=False)
        (emb_ln): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder): XLMRobertaEncoder(
          (layers): ModuleList(
            (0-23): 24 x Block(
              (mixer): MHA(
                (rotary_emb): RotaryEmbedding()
                (Wqkv): ParametrizedLinearResidual(
                  in_features=1024, out_features=3072, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
                (inner_attn): FlashSelfAttention(
                  (drop): Dropout(p=0.1, inplace=False)
                )
                (inner_cross_attn): FlashCrossAttention(
                  (drop): Dropout(p=0.1, inplace=False)
                )
                (out_proj): ParametrizedLinear(
                  in_features=1024, out_features=1024, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
              )
              (dropout1): Dropout(p=0.1, inplace=False)
              (drop_path1): StochasticDepth(p=0.0, mode=row)
              (norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): ParametrizedLinear(
                  in_features=1024, out_features=4096, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
                (fc2): ParametrizedLinear(
                  in_features=4096, out_features=1024, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
              )
              (dropout2): Dropout(p=0.1, inplace=False)
              (drop_path2): StochasticDepth(p=0.0, mode=row)
              (norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            )
          )
        )
        (pooler): XLMRobertaPooler(
          (dense): ParametrizedLinear(
            in_features=1024, out_features=1024, bias=True
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
          (activation): Tanh()
        )
      )
    )
  )
  (pooler): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (normalizer): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'users from all over the world increasingly adopt social media for newsgathering, especially during breaking news. breaking news is an unexpected event that is currently developing. early stages of breaking news are usually associated with lots of unverified information, i.e., rumors. efficiently detecting and acting upon rumors in a timely fashion is of high importance to minimize their harmful effects. yet, not all rumors have the potential to spread in social media. high-engaging rumors are those written in a manner that ensures achievement of the highest prevalence among the recipients. they are difficult to detect, spread very fast, and can cause serious damage to society. in this article, we propose a new multi-task convolutional neural network (cnn) attention-based neural network architecture to jointly learn the two tasks of breaking news rumors detection and breaking news rumors popularity prediction in social media. the proposed model learns the salient semantic similarities among important features for detecting high-engaging breaking news rumors and separates them from the rest of the input text. extensive experiments on five real-life datasets of breaking news suggest that our proposed model outperforms all baselines and is capable of detecting breaking news rumors and predicting their future popularity with high accuracy.',
    'perkembangan teknologi membawa perubahan besar bagi kehidupan manusia. salah satu perubahan yang paling menonjol dari adanya perkembangan teknologi adalah munculnya aplikasi-aplikasi yang memiliki kecerdasan buatan. seiring berkembangnya teknologi mendorong para pelaku bisnis beralih ke teknologi artificial intelligence (ai). industri layanan publik mulai mengakomodasi pergesaran dari manual ke digital. hal ini membuktikan bahwa industri layanan publik seperti pemerintahan mulai berbenah untuk beralih ke teknologi ai. dalam mengakomodasi pergeseran tersebut industri pariwisata jakarta menjadi sorotan karena memiliki banyak objek wisata namun tidak semua orang mengetahui mengenai informasi objek wisata yang ada di dki jakarta. data potensi dan permasalahan pariwisata jakarta menyatakan bahwa publikasi dan informasi objek wisata yang terbatas dan kurang komunikatif menjadi salah satu permasalahan yang sangat disoroti oleh pemerintah jakarta dalam pengembangan objek wisata dki jakarta. chatbot merupakan sebuah program komputer bebasis ai yang mampu berinteraksi dengan penggunanya dengan menggunakan bahasa alami. aplikasi chatbot informasi objek wisata dengan pemrosesan bahasa alami (natural language precessing) dibangun dengan menggunakan metode artificial intelligence markup language (aiml) sebagai basis dasar pengetahuan chatbot dan algoritma enhanced confix stripping (ecs) sebagai pengolah input user menjadi kata dasar. aplikasi ini dibangun dan telah diuji dengan menggunakan metode technology acceptance model (tam) dengan menghasilkan angka sebesar 85.56% responden yang menyatakan setuju bahwa aplikasi tidak membutuhkan banyak usaha untuk menggunakannya (perceived ease of use) dan 85.78% responden yang setuju bahwa aplikasi ini dapat meningkatkan kinerja pengguna dalam menemukan infomasi objek wisata (perceived usefulness).',
    "telanaipura sector police is the republic of indonesia's police command structure at the sub-district level in jambi province. in the criminal investigation department, the telecommunication sector in processing criminal data is still done manually assisted with microsoft word so that the processing of the data has not been properly processed in order to carry out data analysis, evaluation and profiling of data on the increasing number and extent of crime. extensively. the aim of reducing crime rates is that it needs to be built web-based applications and tested using weka applications to help process and analyze data using data mining techniques using the decision tree method. in analyzing the data using the age, time, place of occurrence, crime decision tree as an attribute of the case and article theft as a class. in making this prediction application, the php and html programming languages \u200b\u200bare used. and using databases using mysql and in analyzing using microsoft excel and weka. the data inputted is in the form of data on the offender, violation data and admin data. the attributes that are inputted in the weka application are attributes of age, time, place of occurrence, and crime and articles of violation as a class. the output generated from the weka application is in the form of rules or rules from the decision tree while the output generated by the web-based prediction interface results in article violation classes and generates reports on the perpetrator's actions, the number of crimes, crimes per period and per violation article. with the mining data prediction interface criminal classification on the police of the jambi telecommunication sector using the decision tree method can help the telanaipura jambi sector police in evaluating and profiling criminal offenders.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 11,928 training samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string int
    details
    • min: 71 tokens
    • mean: 348.51 tokens
    • max: 1829 tokens
    • 1: ~8.60%
    • 2: ~3.80%
    • 3: ~8.80%
    • 4: ~7.30%
    • 5: ~8.00%
    • 6: ~6.60%
    • 7: ~6.10%
    • 8: ~6.70%
    • 9: ~6.80%
    • 10: ~4.90%
    • 11: ~3.80%
    • 12: ~8.70%
    • 13: ~7.90%
    • 14: ~7.00%
    • 15: ~5.00%
  • Samples:
    sentence label
    to improve the convergence speed and optimization accuracy of the dung beetle optimizer (dbo), this paper proposes an improved algorithm based on circle mapping and longitudinal-horizontal crossover strategy (cicrdbo). first, the circle method is used to map the initial population to increase diversity. second, the longitudinal-horizontal crossover strategy is applied to enhance the global search ability by ensuring the position updates of the dung beetle. simulations were conducted on 10 benchmark test functions, and the results demonstrate that the improved algorithm performs well in both convergence speed and optimization accuracy. the improved algorithm is further applied to the hyperparameter selection of the random forest classification algorithm for binary classification prediction in the retail industry. various combination comparisons prove the practicality of the improved algorithm, followed by shapley additive explanations (shap) analysis. 9
    background minimally invasive concepts are increasingly influential in modern cardiac surgery. this study aimed to evaluate the effect of completeness of revascularization on clinical outcomes and overall survival in minimally invasive, thoracoscopic coronary artery bypass grafting (cabg) surgery. methods we retrospectively evaluated a consecutive series of 1,149 patients who underwent minimally invasive off-pump cabg with single, double, or triple-vessel revascularization between 2007 and 2018. of these patients, 185 (16.1%) had incomplete revascularization (ir) (group i), and 964 (83.9%) had complete revascularization (cr) (group c). we used gradient boosted propensity score estimation to account for possible confounding variables. results median age was 69 years, interquartile range (iqr) 60–76 years, and median euroscore ii was 4, iqr 2–7. of the 1,149 patients, 495 patients suffered from two vessel disease (vd) and 353 presented with three vd. long-term median follow-up 5.58 (3.27... 11
    setiap tahunnya perguruan tinggi melakukan penerimaan mahasiswa baru secara rutin untuk membuka awal tahun ajaran baru. namun tingginya jumlah mahasiswa yang mengundurkan diri menyebabkan banyaknya jumlah kursi kosong yang tersisa. pengunduran diri yang terjadi bisa diminimalisir apabila seleksi calon mahasiswa baru dilakukan dengan tepat. salah satu caranya dengan membuat model prediksi berbasis machine learning untuk membantu proses seleksi kandidat yang berpotensi menyelesaikan proses penerimaan hingga akhir berdasarkan data yang ada. agar hal tersebut bisa tercapai, dibuatlah model prediksi menggunakan algoritma adaboost sekaligus membandingkan performanya dengan model alogoritma decision tree. untuk memaksimalkan peforma model, maka dilakukan analisa variabel dengan menggunakan chi square dalam proses feature selection- nya. hasil akhir menunjukkan bahwa model prediksi adaboost memiliki peforma yang lebih baik daripada model decision tree dengan skor f-measure 90.9%, precision 83.... 8
  • Loss: BatchHardSoftMarginTripletLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 2,983 evaluation samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string int
    details
    • min: 33 tokens
    • mean: 364.81 tokens
    • max: 2651 tokens
    • 1: ~9.50%
    • 2: ~3.10%
    • 3: ~7.50%
    • 4: ~9.10%
    • 5: ~7.60%
    • 6: ~7.10%
    • 7: ~7.60%
    • 8: ~6.80%
    • 9: ~5.50%
    • 10: ~4.30%
    • 11: ~3.40%
    • 12: ~7.40%
    • 13: ~6.20%
    • 14: ~9.90%
    • 15: ~5.00%
  • Samples:
    sentence label
    kemiskinan adalah berbeda-beda dan merefleksikan suatu spektrum orientasi ideologi. faktor yang mempengaruhi tingkat kemiskinan adalah pertumbuhan ekonomi. jadi kemiskinan tidak lagi sekedar masalah kekurangan makanan saja. pertumbuhan keseluruh sektor usaha sangat dibutuhkan dalam upaya menurunkan tingkat kemiskinan. untuk menangani dan berkoordinasi dalam hal-hal yang berkaitan dengan penanggulangan kemiskinan, maka perlu mengklasifikasikan usia dan jenis kelamin dari individu dengan tingkat kesejahteraan 30% terbawah. seiring dengan perkembangan teknologi yang begitu pesat, perkembangan algoritma komputer sedang mengembangkan beberapa algoritma untuk mendukung kemajuan sistem komputerisasi. algoritma yang terkenal diantaranya adalah algoritma decision tree (c4.5), random tree, linear and quadratic discriminant analysis, neural network, least square support vector machines, k-nn, random forest, cart, dan naive bayes. 5
    the quantitative data of logistic demand are the important basis for regional logistic development policy and planning,there are many factors influencing logistic demand,so traditional forecast method can not overall consider all kinds of factors and has lower forecast accuracy.in order to improve the forecast accuracy of logistic demand,combined forecast method is used to set up the combined forecast model based on support vector machines and neural network,firstly support vector machines are used to forecast and forecast basic data are obtained,then residual modification is conducted by bp neural network,and numerical example simulation analysis indicates that the combined forecast model has higher accuracy,is a kind of effective forecast model and provides a new idea for logistic demand forecast. 5
    this paper discusses the problem of cultural identity for learners of french who have inherited an educational system introduced by the colonizers. the historical experience of the caribbean has resulted in a critical and significant difference between what we really are and what we have become by the process. if our african heritage lies at the centre of our cultural identity it gives meaning to the strategies and positioning that can be used in an educational process that acknowledges the “doubleness” of similarity and difference in the constant process of “becoming”. if english can be considered the first foreign language by the majority of the people of the african diaspora, french is a second foreign language, in which learners are stretched linguistically and culturally at the same time. the natural dramatic propensities of blacks have an important position of play in the complexity of doubleness. 2
  • Loss: BatchHardSoftMarginTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
1.0 746 0.5721 0.5504
2.0 1492 0.5437 0.5239
3.0 2238 0.5205 0.5041
4.0 2984 0.517 0.5005
5.0 3730 0.5124 0.5002

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 4.1.0
  • Transformers: 4.53.0
  • PyTorch: 2.6.0+cu126
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

BatchHardSoftMarginTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yodyamahesa/paper-recommendation-jinav3-keyword

Finetuned
(29)
this model