lapa-llm
/

fasttext-quality-score

Text Classification

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

robinhad commited on Nov 13

Commit

ae9979d

·

verified ·

1 Parent(s): d956028

Update README.md

Files changed (1) hide show

README.md +23 -19

README.md CHANGED Viewed

@@ -1,24 +1,28 @@
----
-library_name: transformers
-license: mit
-base_model: intfloat/multilingual-e5-base
-tags:
-- generated_from_trainer
-metrics:
-- precision
-- recall
-- accuracy
-model-index:
-- name: fasttext-quality-score
-  results: []
----
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # fasttext-quality-score
-This model is a fine-tuned version of [intfloat/multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.1726
 - Precision: 0.7268
@@ -28,15 +32,15 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -93,4 +97,4 @@ The following hyperparameters were used during training:
 - Transformers 4.56.1
 - Pytorch 2.6.0a0+ecf3bae40a.nv25.01
 - Datasets 4.0.0
-- Tokenizers 0.22.0

+---
+library_name: transformers
+license: mit
+base_model: intfloat/multilingual-e5-base
+tags:
+- generated_from_trainer
+metrics:
+- precision
+- recall
+- accuracy
+model-index:
+- name: fasttext-quality-score
+  results: []
+datasets:
+- lapa-llm/classifier_source
+language:
+- uk
+---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # fasttext-quality-score
+This model is a fine-tuned version of [intfloat/multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) on an [transferred from English](https://huggingface.co/datasets/lapa-llm/classifier_source).
 It achieves the following results on the evaluation set:
 - Loss: 0.1726
 - Precision: 0.7268
 ## Model description
+This model measure the coherence of the given text, as defined by similarity to ELI5 texts from Reddit.
 ## Intended uses & limitations
+Data filtering and evaluation of pretraining data at scale.
 ## Training and evaluation data
+Take a look at https://github.com/lapa-llm/lapa-llm/blob/main/pretraining/quality-classifiers/fasttext_classifier.py
 ## Training procedure
 - Transformers 4.56.1
 - Pytorch 2.6.0a0+ecf3bae40a.nv25.01
 - Datasets 4.0.0
+- Tokenizers 0.22.0