SentenceTransformer based on jinaai/jina-embeddings-v3

This is a sentence-transformers model finetuned from jinaai/jina-embeddings-v3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: jinaai/jina-embeddings-v3
Maximum Sequence Length: 8194 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (transformer): Transformer(
    (auto_model): XLMRobertaLoRA(
      (roberta): XLMRobertaModel(
        (embeddings): XLMRobertaEmbeddings(
          (word_embeddings): ParametrizedEmbedding(
            250002, 1024, padding_idx=1
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
          (token_type_embeddings): ParametrizedEmbedding(
            1, 1024
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
        )
        (emb_drop): Dropout(p=0.1, inplace=False)
        (emb_ln): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder): XLMRobertaEncoder(
          (layers): ModuleList(
            (0-23): 24 x Block(
              (mixer): MHA(
                (rotary_emb): RotaryEmbedding()
                (Wqkv): ParametrizedLinearResidual(
                  in_features=1024, out_features=3072, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
                (inner_attn): FlashSelfAttention(
                  (drop): Dropout(p=0.1, inplace=False)
                )
                (inner_cross_attn): FlashCrossAttention(
                  (drop): Dropout(p=0.1, inplace=False)
                )
                (out_proj): ParametrizedLinear(
                  in_features=1024, out_features=1024, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
              )
              (dropout1): Dropout(p=0.1, inplace=False)
              (drop_path1): StochasticDepth(p=0.0, mode=row)
              (norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): ParametrizedLinear(
                  in_features=1024, out_features=4096, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
                (fc2): ParametrizedLinear(
                  in_features=4096, out_features=1024, bias=True
                  (parametrizations): ModuleDict(
                    (weight): ParametrizationList(
                      (0): LoRAParametrization()
                    )
                  )
                )
              )
              (dropout2): Dropout(p=0.1, inplace=False)
              (drop_path2): StochasticDepth(p=0.0, mode=row)
              (norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            )
          )
        )
        (pooler): XLMRobertaPooler(
          (dense): ParametrizedLinear(
            in_features=1024, out_features=1024, bias=True
            (parametrizations): ModuleDict(
              (weight): ParametrizationList(
                (0): LoRAParametrization()
              )
            )
          )
          (activation): Tanh()
        )
      )
    )
  )
  (pooler): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (normalizer): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'users from all over the world increasingly adopt social media for newsgathering, especially during breaking news. breaking news is an unexpected event that is currently developing. early stages of breaking news are usually associated with lots of unverified information, i.e., rumors. efficiently detecting and acting upon rumors in a timely fashion is of high importance to minimize their harmful effects. yet, not all rumors have the potential to spread in social media. high-engaging rumors are those written in a manner that ensures achievement of the highest prevalence among the recipients. they are difficult to detect, spread very fast, and can cause serious damage to society. in this article, we propose a new multi-task convolutional neural network (cnn) attention-based neural network architecture to jointly learn the two tasks of breaking news rumors detection and breaking news rumors popularity prediction in social media. the proposed model learns the salient semantic similarities among important features for detecting high-engaging breaking news rumors and separates them from the rest of the input text. extensive experiments on five real-life datasets of breaking news suggest that our proposed model outperforms all baselines and is capable of detecting breaking news rumors and predicting their future popularity with high accuracy.',
    'perkembangan teknologi membawa perubahan besar bagi kehidupan manusia. salah satu perubahan yang paling menonjol dari adanya perkembangan teknologi adalah munculnya aplikasi-aplikasi yang memiliki kecerdasan buatan. seiring berkembangnya teknologi mendorong para pelaku bisnis beralih ke teknologi artificial intelligence (ai). industri layanan publik mulai mengakomodasi pergesaran dari manual ke digital. hal ini membuktikan bahwa industri layanan publik seperti pemerintahan mulai berbenah untuk beralih ke teknologi ai. dalam mengakomodasi pergeseran tersebut industri pariwisata jakarta menjadi sorotan karena memiliki banyak objek wisata namun tidak semua orang mengetahui mengenai informasi objek wisata yang ada di dki jakarta. data potensi dan permasalahan pariwisata jakarta menyatakan bahwa publikasi dan informasi objek wisata yang terbatas dan kurang komunikatif menjadi salah satu permasalahan yang sangat disoroti oleh pemerintah jakarta dalam pengembangan objek wisata dki jakarta. chatbot merupakan sebuah program komputer bebasis ai yang mampu berinteraksi dengan penggunanya dengan menggunakan bahasa alami. aplikasi chatbot informasi objek wisata dengan pemrosesan bahasa alami (natural language precessing) dibangun dengan menggunakan metode artificial intelligence markup language (aiml) sebagai basis dasar pengetahuan chatbot dan algoritma enhanced confix stripping (ecs) sebagai pengolah input user menjadi kata dasar. aplikasi ini dibangun dan telah diuji dengan menggunakan metode technology acceptance model (tam) dengan menghasilkan angka sebesar 85.56% responden yang menyatakan setuju bahwa aplikasi tidak membutuhkan banyak usaha untuk menggunakannya (perceived ease of use) dan 85.78% responden yang setuju bahwa aplikasi ini dapat meningkatkan kinerja pengguna dalam menemukan infomasi objek wisata (perceived usefulness).',
    "telanaipura sector police is the republic of indonesia's police command structure at the sub-district level in jambi province. in the criminal investigation department, the telecommunication sector in processing criminal data is still done manually assisted with microsoft word so that the processing of the data has not been properly processed in order to carry out data analysis, evaluation and profiling of data on the increasing number and extent of crime. extensively. the aim of reducing crime rates is that it needs to be built web-based applications and tested using weka applications to help process and analyze data using data mining techniques using the decision tree method. in analyzing the data using the age, time, place of occurrence, crime decision tree as an attribute of the case and article theft as a class. in making this prediction application, the php and html programming languages \u200b\u200bare used. and using databases using mysql and in analyzing using microsoft excel and weka. the data inputted is in the form of data on the offender, violation data and admin data. the attributes that are inputted in the weka application are attributes of age, time, place of occurrence, and crime and articles of violation as a class. the output generated from the weka application is in the form of rules or rules from the decision tree while the output generated by the web-based prediction interface results in article violation classes and generates reports on the perpetrator's actions, the number of crimes, crimes per period and per violation article. with the mining data prediction interface criminal classification on the police of the jambi telecommunication sector using the decision tree method can help the telanaipura jambi sector police in evaluating and profiling criminal offenders.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 11,928 training samples
Columns: sentence and label

Approximate statistics based on the first 1000 samples:

	sentence	label
type	string	int
details	min: 71 tokens mean: 348.51 tokens max: 1829 tokens	1: ~8.60% 2: ~3.80% 3: ~8.80% 4: ~7.30% 5: ~8.00% 6: ~6.60% 7: ~6.10% 8: ~6.70% 9: ~6.80% 10: ~4.90% 11: ~3.80% 12: ~8.70% 13: ~7.90% 14: ~7.00% 15: ~5.00%

Samples:

sentence	label
to improve the convergence speed and optimization accuracy of the dung beetle optimizer (dbo), this paper proposes an improved algorithm based on circle mapping and longitudinal-horizontal crossover strategy (cicrdbo). first, the circle method is used to map the initial population to increase diversity. second, the longitudinal-horizontal crossover strategy is applied to enhance the global search ability by ensuring the position updates of the dung beetle. simulations were conducted on 10 benchmark test functions, and the results demonstrate that the improved algorithm performs well in both convergence speed and optimization accuracy. the improved algorithm is further applied to the hyperparameter selection of the random forest classification algorithm for binary classification prediction in the retail industry. various combination comparisons prove the practicality of the improved algorithm, followed by shapley additive explanations (shap) analysis.	`9`
background minimally invasive concepts are increasingly influential in modern cardiac surgery. this study aimed to evaluate the effect of completeness of revascularization on clinical outcomes and overall survival in minimally invasive, thoracoscopic coronary artery bypass grafting (cabg) surgery. methods we retrospectively evaluated a consecutive series of 1,149 patients who underwent minimally invasive off-pump cabg with single, double, or triple-vessel revascularization between 2007 and 2018. of these patients, 185 (16.1%) had incomplete revascularization (ir) (group i), and 964 (83.9%) had complete revascularization (cr) (group c). we used gradient boosted propensity score estimation to account for possible confounding variables. results median age was 69 years, interquartile range (iqr) 60–76 years, and median euroscore ii was 4, iqr 2–7. of the 1,149 patients, 495 patients suffered from two vessel disease (vd) and 353 presented with three vd. long-term median follow-up 5.58 (3.27...	`11`
setiap tahunnya perguruan tinggi melakukan penerimaan mahasiswa baru secara rutin untuk membuka awal tahun ajaran baru. namun tingginya jumlah mahasiswa yang mengundurkan diri menyebabkan banyaknya jumlah kursi kosong yang tersisa. pengunduran diri yang terjadi bisa diminimalisir apabila seleksi calon mahasiswa baru dilakukan dengan tepat. salah satu caranya dengan membuat model prediksi berbasis machine learning untuk membantu proses seleksi kandidat yang berpotensi menyelesaikan proses penerimaan hingga akhir berdasarkan data yang ada. agar hal tersebut bisa tercapai, dibuatlah model prediksi menggunakan algoritma adaboost sekaligus membandingkan performanya dengan model alogoritma decision tree. untuk memaksimalkan peforma model, maka dilakukan analisa variabel dengan menggunakan chi square dalam proses feature selection- nya. hasil akhir menunjukkan bahwa model prediksi adaboost memiliki peforma yang lebih baik daripada model decision tree dengan skor f-measure 90.9%, precision 83....	`8`

Loss: BatchHardSoftMarginTripletLoss

Evaluation Dataset

Unnamed Dataset

Size: 2,983 evaluation samples
Columns: sentence and label

Approximate statistics based on the first 1000 samples:

	sentence	label
type	string	int
details	min: 33 tokens mean: 364.81 tokens max: 2651 tokens	1: ~9.50% 2: ~3.10% 3: ~7.50% 4: ~9.10% 5: ~7.60% 6: ~7.10% 7: ~7.60% 8: ~6.80% 9: ~5.50% 10: ~4.30% 11: ~3.40% 12: ~7.40% 13: ~6.20% 14: ~9.90% 15: ~5.00%

Samples:

sentence	label
kemiskinan adalah berbeda-beda dan merefleksikan suatu spektrum orientasi ideologi. faktor yang mempengaruhi tingkat kemiskinan adalah pertumbuhan ekonomi. jadi kemiskinan tidak lagi sekedar masalah kekurangan makanan saja. pertumbuhan keseluruh sektor usaha sangat dibutuhkan dalam upaya menurunkan tingkat kemiskinan. untuk menangani dan berkoordinasi dalam hal-hal yang berkaitan dengan penanggulangan kemiskinan, maka perlu mengklasifikasikan usia dan jenis kelamin dari individu dengan tingkat kesejahteraan 30% terbawah. seiring dengan perkembangan teknologi yang begitu pesat, perkembangan algoritma komputer sedang mengembangkan beberapa algoritma untuk mendukung kemajuan sistem komputerisasi. algoritma yang terkenal diantaranya adalah algoritma decision tree (c4.5), random tree, linear and quadratic discriminant analysis, neural network, least square support vector machines, k-nn, random forest, cart, dan naive bayes.	`5`
the quantitative data of logistic demand are the important basis for regional logistic development policy and planning,there are many factors influencing logistic demand,so traditional forecast method can not overall consider all kinds of factors and has lower forecast accuracy.in order to improve the forecast accuracy of logistic demand,combined forecast method is used to set up the combined forecast model based on support vector machines and neural network,firstly support vector machines are used to forecast and forecast basic data are obtained,then residual modification is conducted by bp neural network,and numerical example simulation analysis indicates that the combined forecast model has higher accuracy,is a kind of effective forecast model and provides a new idea for logistic demand forecast.	`5`
this paper discusses the problem of cultural identity for learners of french who have inherited an educational system introduced by the colonizers. the historical experience of the caribbean has resulted in a critical and significant difference between what we really are and what we have become by the process. if our african heritage lies at the centre of our cultural identity it gives meaning to the strategies and positioning that can be used in an educational process that acknowledges the “doubleness” of similarity and difference in the constant process of “becoming”. if english can be considered the first foreign language by the majority of the people of the african diaspora, french is a second foreign language, in which learners are stretched linguistically and culturally at the same time. the natural dramatic propensities of blacks have an important position of play in the complexity of doubleness.	`2`

Loss: BatchHardSoftMarginTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: epoch
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 5
warmup_ratio: 0.1
bf16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss
1.0	746	0.5721	0.5504
2.0	1492	0.5437	0.5239
3.0	2238	0.5205	0.5041
4.0	2984	0.517	0.5005
5.0	3730	0.5124	0.5002

Framework Versions

Python: 3.12.3
Sentence Transformers: 4.1.0
Transformers: 4.53.0
PyTorch: 2.6.0+cu126
Accelerate: 1.8.1
Datasets: 3.6.0
Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

BatchHardSoftMarginTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Downloads last month: 2

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for yodyamahesa/paper-recommendation-jinav3-keyword

Base model

jinaai/jina-embeddings-v3

Finetuned

(29)

this model