AIPlans
/

Qwen3-0.6B-RM-hs2

Text Classification

Generated from Trainer

text-generation-inference

Model card Files Files and versions

sorakritt commited on 17 days ago

Commit

42730db

·

verified ·

1 Parent(s): 9717aad

Update README.md

Files changed (1) hide show

README.md +9 -15

README.md CHANGED Viewed

@@ -9,30 +9,24 @@ tags:
 licence: license
 datasets:
 - Jennny/helpsteer2-helpfulness-preference
 ---
 # Model Card for qwen3-0.6b-RM-hs2
 This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base).
 It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
-```python
-from transformers import pipeline
-text = "The capital of France is Paris."
-rewarder = pipeline(model="sorakritt/qwen3-0.6b-RM-hs2", device="cuda")
-output = rewarder(text)[0]
-print(output["score"])
-```
 ## Training procedure
-This model was trained using prompts with a chosen response >=3 only. It took about 1h 20mins with an A100(40 GB).
 ### Framework versions

 licence: license
 datasets:
 - Jennny/helpsteer2-helpfulness-preference
+- nvidia/HelpSteer2
+license: mit
+language:
+- en
+pipeline_tag: text-classification
 ---
+<a href="https://aiplans.org" target="_blank" style="margin: 2px;"> <img alt="AIPlans" src="./logos/AI-Plans.svg" style="display: inline-block; vertical-align: middle;"/> </a>
 # Model Card for qwen3-0.6b-RM-hs2
 This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base).
 It has been trained using [TRL](https://github.com/huggingface/trl).
+Intended Use: Research on model diffing, preference fine-tuning, and evaluation of lightweight LLM behavior changes.
+It was developed for use in the Model Diffing project of AI-Plans.
 ## Training procedure
+This model is a reward model and was trained using prompts with a chosen response >=3 only. It took about 1h 20mins with an A100(40 GB).
 ### Framework versions