caiyuchen commited on
Commit
dc40ab5
·
verified ·
1 Parent(s): a16895a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -18,7 +18,8 @@ base_model:
18
  # On Predictability of Reinforcement Learning Dynamics for Large Language Models
19
 
20
 
21
- ![Paper Overview](overview.png)
 
22
 
23
  This repository provides one of the models used in our paper **"On Predictability of Reinforcement Learning Dynamics for Large Language Models"** for evaluating and predicting reinforcement learning (RL) dynamics in large language models (LLMs).
24
 
@@ -51,8 +52,8 @@ prompt = tokenizer.apply_chat_template(
51
  ```python
52
  from transformers import AutoModelForCausalLM, AutoTokenizer
53
 
54
- model = AutoModelForCausalLM.from_pretrained("caiyuchen/DAPO-step-{i}")
55
- tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DAPO-step-{i}")
56
 
57
  question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
58
  question_with_instruction = question + "
 
18
  # On Predictability of Reinforcement Learning Dynamics for Large Language Models
19
 
20
 
21
+ ![Paper Overview]
22
+
23
 
24
  This repository provides one of the models used in our paper **"On Predictability of Reinforcement Learning Dynamics for Large Language Models"** for evaluating and predicting reinforcement learning (RL) dynamics in large language models (LLMs).
25
 
 
52
  ```python
53
  from transformers import AutoModelForCausalLM, AutoTokenizer
54
 
55
+ model = AutoModelForCausalLM.from_pretrained("caiyuchen/DAPO-step-0")
56
+ tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DAPO-step-0")
57
 
58
  question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
59
  question_with_instruction = question + "