robinhad commited on
Commit
ae9979d
·
verified ·
1 Parent(s): d956028

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -19
README.md CHANGED
@@ -1,24 +1,28 @@
1
- ---
2
- library_name: transformers
3
- license: mit
4
- base_model: intfloat/multilingual-e5-base
5
- tags:
6
- - generated_from_trainer
7
- metrics:
8
- - precision
9
- - recall
10
- - accuracy
11
- model-index:
12
- - name: fasttext-quality-score
13
- results: []
14
- ---
 
 
 
 
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
  # fasttext-quality-score
20
 
21
- This model is a fine-tuned version of [intfloat/multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) on an unknown dataset.
22
  It achieves the following results on the evaluation set:
23
  - Loss: 0.1726
24
  - Precision: 0.7268
@@ -28,15 +32,15 @@ It achieves the following results on the evaluation set:
28
 
29
  ## Model description
30
 
31
- More information needed
32
 
33
  ## Intended uses & limitations
34
 
35
- More information needed
36
 
37
  ## Training and evaluation data
38
 
39
- More information needed
40
 
41
  ## Training procedure
42
 
@@ -93,4 +97,4 @@ The following hyperparameters were used during training:
93
  - Transformers 4.56.1
94
  - Pytorch 2.6.0a0+ecf3bae40a.nv25.01
95
  - Datasets 4.0.0
96
- - Tokenizers 0.22.0
 
1
+ ---
2
+ library_name: transformers
3
+ license: mit
4
+ base_model: intfloat/multilingual-e5-base
5
+ tags:
6
+ - generated_from_trainer
7
+ metrics:
8
+ - precision
9
+ - recall
10
+ - accuracy
11
+ model-index:
12
+ - name: fasttext-quality-score
13
+ results: []
14
+ datasets:
15
+ - lapa-llm/classifier_source
16
+ language:
17
+ - uk
18
+ ---
19
 
20
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
21
  should probably proofread and complete it, then remove this comment. -->
22
 
23
  # fasttext-quality-score
24
 
25
+ This model is a fine-tuned version of [intfloat/multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) on an [transferred from English](https://huggingface.co/datasets/lapa-llm/classifier_source).
26
  It achieves the following results on the evaluation set:
27
  - Loss: 0.1726
28
  - Precision: 0.7268
 
32
 
33
  ## Model description
34
 
35
+ This model measure the coherence of the given text, as defined by similarity to ELI5 texts from Reddit.
36
 
37
  ## Intended uses & limitations
38
 
39
+ Data filtering and evaluation of pretraining data at scale.
40
 
41
  ## Training and evaluation data
42
 
43
+ Take a look at https://github.com/lapa-llm/lapa-llm/blob/main/pretraining/quality-classifiers/fasttext_classifier.py
44
 
45
  ## Training procedure
46
 
 
97
  - Transformers 4.56.1
98
  - Pytorch 2.6.0a0+ecf3bae40a.nv25.01
99
  - Datasets 4.0.0
100
+ - Tokenizers 0.22.0