Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
In a Training Loop 🔄
98.0
TFLOPS
72
122
261
Asankhaya Sharma
PRO
codelion
Follow
dkmaitra's profile picture
armondal's profile picture
Evi1ran's profile picture
383 followers
·
21 following
http://asankhaya.github.io/
asankhaya
codelion
asankhaya
AI & ML interests
Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.
Recent Activity
reacted
to
their
post
with 🚀
4 days ago
Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing
reacted
to
their
post
with 👍
4 days ago
Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing
reacted
to
their
post
with 🔥
4 days ago
Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing
View all activity
Organizations
codelion
's datasets
38
Sort: Recently updated
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-steering-vectors
Preview
•
Updated
May 13
•
26
•
1
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts
Preview
•
Updated
May 13
•
26
•
1
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-dpo-pairs
Preview
•
Updated
May 13
•
33
•
1
codelion/math500-cot-experiment
Viewer
•
Updated
Apr 30
•
1.5k
•
37
•
5
codelion/optillmbench
Viewer
•
Updated
Apr 15
•
500
•
36
•
5
codelion/optillm-router-dataset
Viewer
•
Updated
Apr 12
•
2.81k
•
92
•
6
codelion/Sky-T1_data_17k
Viewer
•
Updated
Jan 11
•
16.4k
•
43
•
1
codelion/worker-safety-qa-eval
Viewer
•
Updated
Jun 20, 2024
•
34
•
124
•
4
Previous
1
2
Next