caiyuchen commited on
Commit
9c47b53
·
verified ·
1 Parent(s): 434684b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +2 -5
README.md CHANGED
@@ -18,9 +18,6 @@ base_model:
18
  # On Predictability of Reinforcement Learning Dynamics for Large Language Models
19
 
20
 
21
- ![Overview](overview.png)
22
-
23
-
24
 
25
  This repository provides one of the models used in our paper **"On Predictability of Reinforcement Learning Dynamics for Large Language Models"** for evaluating and predicting reinforcement learning (RL) dynamics in large language models (LLMs).
26
 
@@ -53,8 +50,8 @@ prompt = tokenizer.apply_chat_template(
53
  ```python
54
  from transformers import AutoModelForCausalLM, AutoTokenizer
55
 
56
- model = AutoModelForCausalLM.from_pretrained("caiyuchen/DPO-step-16")
57
- tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DPO-step-16")
58
 
59
  question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
60
  question_with_instruction = question + "Please reason step by step, and put your final answer within \boxed{{}}"
 
18
  # On Predictability of Reinforcement Learning Dynamics for Large Language Models
19
 
20
 
 
 
 
21
 
22
  This repository provides one of the models used in our paper **"On Predictability of Reinforcement Learning Dynamics for Large Language Models"** for evaluating and predicting reinforcement learning (RL) dynamics in large language models (LLMs).
23
 
 
50
  ```python
51
  from transformers import AutoModelForCausalLM, AutoTokenizer
52
 
53
+ model = AutoModelForCausalLM.from_pretrained("caiyuchen/PPO-step-16")
54
+ tokenizer = AutoTokenizer.from_pretrained("caiyuchen/PPO-step-16")
55
 
56
  question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
57
  question_with_instruction = question + "Please reason step by step, and put your final answer within \boxed{{}}"