arxiv:2502.18293
Taneesh Gupta
gupta-tanish
·
AI & ML interests
Post-Training @MicrosoftResearch
Organizations
None yet
models
26
gupta-tanish/llama-off-policy-qwq-10k-perturbation-iter1
Text Generation
•
8B
•
Updated
•
1
gupta-tanish/llama-3-8b-instruct-refa-budget_length-256-lamda-1.0-iteration2
Text Generation
•
8B
•
Updated
•
1
gupta-tanish/llama-3-8b-instruct-refa-budget_length-256-lamda-20.0-iteration1
Text Generation
•
8B
•
Updated
gupta-tanish/llama-3-8b-instruct-refa-lr-1e-6-beta10-gamma4-lambda-1.0-eos-increase-iteration2-lamda-0.1
Text Generation
•
8B
•
Updated
•
3
gupta-tanish/llama-3-8b-instruct-refa-lr-1e-6-beta10-gamma4-lambda-0.1-eos-increase-iteration2
Text Generation
•
8B
•
Updated
•
2
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.001-lr-1e-6-iteration1
Text Generation
•
8B
•
Updated
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.01-lr-1e-6-iteration1
Text Generation
•
8B
•
Updated
•
1
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.1-lr-1e-6-iteration1
Text Generation
•
8B
•
Updated
•
1
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-1.0-lr-1e-6-iteration1
Text Generation
•
8B
•
Updated
•
2
gupta-tanish/mistral-7b-instruct-refa-iteration2
Text Generation
•
7B
•
Updated
•
1
datasets
188
gupta-tanish/llama3-8b-instruct-on-policy-std-4
Viewer
•
Updated
•
60.8k
•
1
gupta-tanish/llama3-8b-instruct-on-policy-with-stats
Viewer
•
Updated
•
60.8k
•
1
gupta-tanish/llama3-8b-instruct-on-policy-noise-std-2.0
Viewer
•
Updated
•
60.8k
•
1
gupta-tanish/llama3-8b-instruct-on-policy-noise-std-1.0
Viewer
•
Updated
•
60.8k
•
1
gupta-tanish/llama3-8b-instruct-on-policy-noise-std-0.5
Viewer
•
Updated
•
60.8k
•
1
gupta-tanish/llama3-8b-instruct-on-policy-GRM
Viewer
•
Updated
•
61.1k
•
4
gupta-tanish/llama3-8b-instruct-on-policy-ArmoRM
Viewer
•
Updated
•
61.8k
•
5
gupta-tanish/llama3-8b-instruct-on-policy-PairRM
Viewer
•
Updated
•
61.8k
•
1
gupta-tanish/Qwen2.5-math-1.5B-Instruct_method_cpo_iteration_5
Viewer
•
Updated
•
2.17k
•
2
gupta-tanish/Qwen2.5-math-1.5B-Instruct_method_cpo_iteration_4
Viewer
•
Updated
•
2.11k
•
2