AI & ML interests
In the following you find models tuned to be used for sentence / text embedding generation. They can be used with the sentence-transformers package.
Recent Activity
Organization Card
SentenceTransformers 🤗 is a Python framework for state-of-the-art sentence, text and image embeddings.
Install the Sentence Transformers library.
pip install -U sentence-transformers
The usage is as simple as:
from sentence_transformers import SentenceTransformer
# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")
# The sentences to encode
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
# 2. Calculate embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# 3. Calculate the embedding similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6660, 0.1046],
# [0.6660, 1.0000, 0.1411],
# [0.1046, 0.1411, 1.0000]])
Hugging Face makes it easy to collaboratively build and showcase your Sentence Transformers models! You can collaborate with your organization, upload and showcase your own models in your profile ❤️
Documentation
Push your Sentence Transformers models to the Hub ❤️
Find all Sentence Transformers models on the 🤗 Hub
To upload your Sentence Transformers models to the Hugging Face Hub, log in with huggingface-cli login and use the push_to_hub method within the Sentence Transformers library.
from sentence_transformers import SentenceTransformer
# Load or train a model
model = SentenceTransformer(...)
# Push to Hub
model.push_to_hub("my_new_model")
A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers
These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual.
-
sentence-transformers/parallel-sentences-wikititles
Viewer • Updated • 14.7M • 60 • 2 -
sentence-transformers/parallel-sentences-tatoeba
Viewer • Updated • 8.35M • 795 -
sentence-transformers/parallel-sentences-talks
Viewer • Updated • 19.6M • 1.02k • 12 -
sentence-transformers/parallel-sentences-europarl
Viewer • Updated • 49.7M • 275 • 1
A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers
These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual.
-
sentence-transformers/parallel-sentences-wikititles
Viewer • Updated • 14.7M • 60 • 2 -
sentence-transformers/parallel-sentences-tatoeba
Viewer • Updated • 8.35M • 795 -
sentence-transformers/parallel-sentences-talks
Viewer • Updated • 19.6M • 1.02k • 12 -
sentence-transformers/parallel-sentences-europarl
Viewer • Updated • 49.7M • 275 • 1
models 127
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Sentence Similarity • 0.1B • Updated
• 19.4M • • 1.14k
sentence-transformers/all-MiniLM-L12-v2
Sentence Similarity • 33.4M • Updated
• 4.08M • • 297
sentence-transformers/embeddinggemma-300m-medical
Sentence Similarity • Updated
• 9.75k • • 44
sentence-transformers/paraphrase-multilingual-mpnet-base-v2
Sentence Similarity • 0.3B • Updated
• 5.7M • • 452
sentence-transformers/stsb-mpnet-base-v2
Sentence Similarity • 0.1B • Updated
• 6.2k • • 13
sentence-transformers/paraphrase-mpnet-base-v2
Sentence Similarity • 0.1B • Updated
• 1.77M • • 47
sentence-transformers/nli-mpnet-base-v2
Sentence Similarity • 0.1B • Updated
• 229k • • 15
sentence-transformers/multi-qa-mpnet-base-dot-v1
Sentence Similarity • 0.1B • Updated
• 4.48M • • 191
sentence-transformers/multi-qa-mpnet-base-cos-v1
Sentence Similarity • 0.1B • Updated
• 196k • • 42
sentence-transformers/all-mpnet-base-v1
Sentence Similarity • 0.1B • Updated
• 28.2k • • 12
datasets 94
sentence-transformers/askubuntu
Viewer
• Updated
• 13.1k • 95 • 2
sentence-transformers/askubuntu-questions
Viewer
• Updated
• 168k • 60 • 1
sentence-transformers/msmarco
Viewer
• Updated
• 527M • 2.18k • 9
sentence-transformers/wiki1m-for-simcse
Viewer
• Updated
• 1M • 53 • 1
sentence-transformers/quantized-retrieval-data
Viewer
• Updated
• 40.7M • 317 • 2
sentence-transformers/NanoBEIR-en
Viewer
• Updated
• 63.6k • 6.02k
sentence-transformers/msmarco-scores-ms-marco-MiniLM-L6-v2
Viewer
• Updated
• 241M • 198 • 2
sentence-transformers/msmarco-msmarco-MiniLM-L6-v3
Viewer
• Updated
• 80.6M • 256 • 4
sentence-transformers/NanoTouche2020-bm25
Viewer
• Updated
• 5.84k • 19 • 1
sentence-transformers/NanoSciFact-bm25
Viewer
• Updated
• 3.02k • 43