intfloat/multilingual-e5-small-JapaneseTeacher

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Mohamed-Gamil/multilingual-e5-small-JapaneseTeacher")
# Run inference
sentences = [
    "page_content='Do native speakers correctly use counters every time? The answer is no. We often make mistakes or intentionally use wrong ones for the sake of simplicity.  \nCounter: 膳  \n<!-- 🖼️❌ Image not available. Please use `PdfPipelineOptions(generate_picture_images=True)` -->  \nFor example, when you count chopsticks, the counter: 膳 ぜん is right. However, a lot of people count chopsticks with 本 ほん . Although it's not right, 本 ほん is applicable because of the form of chopsticks. What we wanted to say here is that you **don't need be a perfectionist** . Of course, it's better that you can use every counter correctly, but actually, it is a fact that most native speakers can't do it themselves.' metadata={'h3': 'Practical Usages in Reality'}",
    'Are there situations where using a wrong counter is acceptable?',
    'Can you give an example of a casual sentence versus a polite one in Japanese?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6637
cosine_accuracy@3 0.8561
cosine_accuracy@5 0.9103
cosine_accuracy@10 0.9475
cosine_precision@1 0.6637
cosine_precision@3 0.2854
cosine_precision@5 0.1822
cosine_precision@10 0.0949
cosine_recall@1 0.6633
cosine_recall@3 0.8557
cosine_recall@5 0.9099
cosine_recall@10 0.9475
cosine_ndcg@10 0.8122
cosine_mrr@10 0.7681
cosine_map@100 0.7702

Training Details

Training Dataset

Unnamed Dataset

  • Size: 4,961 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 50 tokens
    • mean: 327.75 tokens
    • max: 512 tokens
    • min: 10 tokens
    • mean: 21.79 tokens
    • max: 40 tokens
  • Samples:
    positive anchor
    page_content=' 私わたしが困こまっていたとき、
    page_content=' 私わたしが困こまっていたとき、
    page_content=' 私わたしが困こまっていたとき、
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • gradient_accumulation_steps: 32
  • learning_rate: 2e-05
  • num_train_epochs: 20
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • dataloader_num_workers: 2
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • dataloader_pin_memory: False
  • gradient_checkpointing: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 32
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 20
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 2
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: False
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss ir_evaluator_cosine_ndcg@10
-1 -1 - 0.5500
1.0 10 67.2044 0.6343
2.0 20 49.2443 0.6757
3.0 30 21.3377 0.7179
4.0 40 8.6437 0.7687
5.0 50 5.8509 0.7862
6.0 60 5.0683 0.7905
7.0 70 3.6658 0.8006
8.0 80 3.5062 0.8011
9.0 90 3.0544 0.8055
10.0 100 2.7832 0.8060
11.0 110 2.743 0.8090
12.0 120 2.3785 0.8056
13.0 130 2.3046 0.8069
14.0 140 2.4136 0.8119
15.0 150 2.3528 0.8119
16.0 160 2.032 0.8115
17.0 170 2.1875 0.8115
18.0 180 2.0299 0.8124
19.0 190 2.1747 0.8126
20.0 200 2.1729 0.8122
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mohamed-Gamil/multilingual-e5-small-JapaneseTeacher

Finetuned
(130)
this model

Papers for Mohamed-Gamil/multilingual-e5-small-JapaneseTeacher

Evaluation results