caiyuchen
/

DAPO-step-0

@@ -18,7 +18,8 @@ base_model:
 # On Predictability of Reinforcement Learning Dynamics for Large Language Models
-![Paper Overview](overview.png)
 This repository provides one of the models used in our paper **"On Predictability of Reinforcement Learning Dynamics for Large Language Models"** for evaluating and predicting reinforcement learning (RL) dynamics in large language models (LLMs).
@@ -51,8 +52,8 @@ prompt = tokenizer.apply_chat_template(
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained("caiyuchen/DAPO-step-{i}")
-tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DAPO-step-{i}")
 question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
 question_with_instruction = question + "

 # On Predictability of Reinforcement Learning Dynamics for Large Language Models
+![Paper Overview]
 This repository provides one of the models used in our paper **"On Predictability of Reinforcement Learning Dynamics for Large Language Models"** for evaluating and predicting reinforcement learning (RL) dynamics in large language models (LLMs).
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("caiyuchen/DAPO-step-0")
+tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DAPO-step-0")
 question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
 question_with_instruction = question + "