An Empirical Study of DPO Configuration Choices for LLM Alignment
-
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_hh-rlhf
Text Generation • Updated • 3 -
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_oasst1
Text Generation • Updated • 1 -
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_PKU-SafeRLHF
Text Generation • Updated • 1 -
jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_ultrafeedback
Text Generation • Updated • 1