caiyuchen
/

PPO-step-16

@@ -18,9 +18,6 @@ base_model:
 # On Predictability of Reinforcement Learning Dynamics for Large Language Models
-![Overview](overview.png)
 This repository provides one of the models used in our paper **"On Predictability of Reinforcement Learning Dynamics for Large Language Models"** for evaluating and predicting reinforcement learning (RL) dynamics in large language models (LLMs).
@@ -53,8 +50,8 @@ prompt = tokenizer.apply_chat_template(
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained("caiyuchen/DPO-step-16")
-tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DPO-step-16")
 question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
 question_with_instruction = question + "Please reason step by step, and put your final answer within \boxed{{}}"

 # On Predictability of Reinforcement Learning Dynamics for Large Language Models
 This repository provides one of the models used in our paper **"On Predictability of Reinforcement Learning Dynamics for Large Language Models"** for evaluating and predicting reinforcement learning (RL) dynamics in large language models (LLMs).
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("caiyuchen/PPO-step-16")
+tokenizer = AutoTokenizer.from_pretrained("caiyuchen/PPO-step-16")
 question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
 question_with_instruction = question + "Please reason step by step, and put your final answer within \boxed{{}}"