sorakritt commited on
Commit
42730db
·
verified ·
1 Parent(s): 9717aad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -15
README.md CHANGED
@@ -9,30 +9,24 @@ tags:
9
  licence: license
10
  datasets:
11
  - Jennny/helpsteer2-helpfulness-preference
 
 
 
 
 
12
  ---
 
13
 
14
  # Model Card for qwen3-0.6b-RM-hs2
15
 
16
  This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base).
17
  It has been trained using [TRL](https://github.com/huggingface/trl).
 
 
18
 
19
- ## Quick start
20
-
21
- ```python
22
- from transformers import pipeline
23
-
24
- text = "The capital of France is Paris."
25
- rewarder = pipeline(model="sorakritt/qwen3-0.6b-RM-hs2", device="cuda")
26
- output = rewarder(text)[0]
27
- print(output["score"])
28
- ```
29
 
30
  ## Training procedure
31
-
32
-
33
-
34
-
35
- This model was trained using prompts with a chosen response >=3 only. It took about 1h 20mins with an A100(40 GB).
36
 
37
  ### Framework versions
38
 
 
9
  licence: license
10
  datasets:
11
  - Jennny/helpsteer2-helpfulness-preference
12
+ - nvidia/HelpSteer2
13
+ license: mit
14
+ language:
15
+ - en
16
+ pipeline_tag: text-classification
17
  ---
18
+ <a href="https://aiplans.org" target="_blank" style="margin: 2px;"> <img alt="AIPlans" src="./logos/AI-Plans.svg" style="display: inline-block; vertical-align: middle;"/> </a>
19
 
20
  # Model Card for qwen3-0.6b-RM-hs2
21
 
22
  This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base).
23
  It has been trained using [TRL](https://github.com/huggingface/trl).
24
+ Intended Use: Research on model diffing, preference fine-tuning, and evaluation of lightweight LLM behavior changes.
25
+ It was developed for use in the Model Diffing project of AI-Plans.
26
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ## Training procedure
29
+ This model is a reward model and was trained using prompts with a chosen response >=3 only. It took about 1h 20mins with an A100(40 GB).
 
 
 
 
30
 
31
  ### Framework versions
32