mzhaoshuai
/

Mistral-7B-Instruct-v0.2-refalign

Text Generation

text-generation-inference

Model card Files Files and versions

Mistral-7B-Instruct-v0.2-refalign / README.md

mzhaoshuai's picture

Update README.md

d5354d9 verified 2 months ago

|

history blame contribute delete

1.16 kB

	---
	base_model:
	- mistralai/Mistral-7B-Instruct-v0.2
	datasets:
	- mzhaoshuai/Llama-3.3-70B-Inst-awq_ultrafeedback_1in3
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	---

	# RefAlign: RL with Similarity-based Rewards

	GitHub repository: https://github.com/mzhaoshuai/RefAlign

	Paper: [Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data](https://huggingface.co/papers/2504.09895).

	The training data is [mzhaoshuai/Llama-3.3-70B-Inst-awq_ultrafeedback_1in3](https://huggingface.co/datasets/mzhaoshuai/Llama-3.3-70B-Inst-awq_ultrafeedback_1in3).

	When conducting Reinforcement Learning with Similarity-based Rewards, the reward function is BERTScore.

	\| Hyper-Parameters \| Value \|
	\|:---------------------------------------------------------\|--------------------------------------------------------\|
	\|LR\|8e-7\|
	\|Batch Size\| 512 \|
	\|Epoch\| 1 \|
	\|Prompt Length\| 400 \|
	\|Generation Length\|800\|
	\|Advantage CLIP\|0.5\|
	\|Sampled Generations (K)\|2\|
	\|BertScore Model\|bart-large-mnli\|