pascalhuerten commited on
Commit
4d074f6
·
verified ·
1 Parent(s): 0692b8c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -110,11 +110,14 @@ db.similarity_search_with_relevance_scores(query, 20)
110
 
111
  ### Finetuning Dataset
112
 
113
- - The model was finetuned on the [German Course Competency Alignment Dataset](https://huggingface.co/datasets/isy-thl/course_competency_alignment_de), which includes alignments of course descriptions to the skill taxonomies of ESCO (European Skills, Competences, Qualifications and Occupations) and GRETA (a competency model for professional teaching competencies in adult education).
 
114
  - This dataset was compiled as part of the **WISY@KI** project, with major contributions from the **Institut für Interaktive Systeme** at the **University of Applied Sciences Lübeck**, the **Kursportal Schleswig-Holstein**, and **Weiterbildung Hessen eV**. Special thanks to colleagues from **MyEduLife** and **Trainspot**.
115
 
116
  ### Finetuning Process
117
 
 
 
118
  - **Hardware Used:**
119
  - Single NVIDIA T4 GPU with 15 GB VRAM.
120
  - **Duration:**
 
110
 
111
  ### Finetuning Dataset
112
 
113
+ - The model was finetuned with data from the [German Course Competency Alignment Dataset](https://huggingface.co/datasets/isy-thl/course_competency_alignment_de), which includes alignments of course descriptions to the skill taxonomies of ESCO (European Skills, Competences, Qualifications and Occupations) and GRETA (a competency model for professional teaching competencies in adult education). About 100 additional pairs of course descriptions and relevant ESCO-skills from the databases of **Kursportal Schleswig-Holstein**, and **Weiterbildung Hessen eV** were used during training but were not allowed to be included in the public dataset.
114
+ - Best results were achieved during training when the human validated pairs of course descriptions and relevant skills were supplemented with only about 1500 random samples from the *ESCO Skill Relations* subset. The dataset used during finetuning therefore included about 2000 samples in total.
115
  - This dataset was compiled as part of the **WISY@KI** project, with major contributions from the **Institut für Interaktive Systeme** at the **University of Applied Sciences Lübeck**, the **Kursportal Schleswig-Holstein**, and **Weiterbildung Hessen eV**. Special thanks to colleagues from **MyEduLife** and **Trainspot**.
116
 
117
  ### Finetuning Process
118
 
119
+ For finetuning the scripts included in the [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/tree/master) repository were used with following enviroment details and training parameters.
120
+
121
  - **Hardware Used:**
122
  - Single NVIDIA T4 GPU with 15 GB VRAM.
123
  - **Duration:**