Update README.md
Browse files
README.md
CHANGED
|
@@ -110,11 +110,14 @@ db.similarity_search_with_relevance_scores(query, 20)
|
|
| 110 |
|
| 111 |
### Finetuning Dataset
|
| 112 |
|
| 113 |
-
- The model was finetuned
|
|
|
|
| 114 |
- This dataset was compiled as part of the **WISY@KI** project, with major contributions from the **Institut für Interaktive Systeme** at the **University of Applied Sciences Lübeck**, the **Kursportal Schleswig-Holstein**, and **Weiterbildung Hessen eV**. Special thanks to colleagues from **MyEduLife** and **Trainspot**.
|
| 115 |
|
| 116 |
### Finetuning Process
|
| 117 |
|
|
|
|
|
|
|
| 118 |
- **Hardware Used:**
|
| 119 |
- Single NVIDIA T4 GPU with 15 GB VRAM.
|
| 120 |
- **Duration:**
|
|
|
|
| 110 |
|
| 111 |
### Finetuning Dataset
|
| 112 |
|
| 113 |
+
- The model was finetuned with data from the [German Course Competency Alignment Dataset](https://huggingface.co/datasets/isy-thl/course_competency_alignment_de), which includes alignments of course descriptions to the skill taxonomies of ESCO (European Skills, Competences, Qualifications and Occupations) and GRETA (a competency model for professional teaching competencies in adult education). About 100 additional pairs of course descriptions and relevant ESCO-skills from the databases of **Kursportal Schleswig-Holstein**, and **Weiterbildung Hessen eV** were used during training but were not allowed to be included in the public dataset.
|
| 114 |
+
- Best results were achieved during training when the human validated pairs of course descriptions and relevant skills were supplemented with only about 1500 random samples from the *ESCO Skill Relations* subset. The dataset used during finetuning therefore included about 2000 samples in total.
|
| 115 |
- This dataset was compiled as part of the **WISY@KI** project, with major contributions from the **Institut für Interaktive Systeme** at the **University of Applied Sciences Lübeck**, the **Kursportal Schleswig-Holstein**, and **Weiterbildung Hessen eV**. Special thanks to colleagues from **MyEduLife** and **Trainspot**.
|
| 116 |
|
| 117 |
### Finetuning Process
|
| 118 |
|
| 119 |
+
For finetuning the scripts included in the [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/tree/master) repository were used with following enviroment details and training parameters.
|
| 120 |
+
|
| 121 |
- **Hardware Used:**
|
| 122 |
- Single NVIDIA T4 GPU with 15 GB VRAM.
|
| 123 |
- **Duration:**
|