Asankhaya Sharma's picture

In a Training Loop 🔄

Asankhaya Sharma PRO

codelion

·

http://asankhaya.github.io/

AI & ML interests

Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.

Recent Activity

reacted to their post with 🚀 4 days ago

Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing

reacted to their post with 👍 4 days ago

Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing

reacted to their post with 🔥 4 days ago

Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing

View all activity

Organizations

codelion 's datasets 38

codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-steering-vectors

Preview • Updated May 13 • 26 • 1

codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts

Preview • Updated May 13 • 26 • 1

codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-dpo-pairs

Preview • Updated May 13 • 33 • 1

codelion/math500-cot-experiment

Viewer • Updated Apr 30 • 1.5k • 37 • 5

codelion/optillmbench

Viewer • Updated Apr 15 • 500 • 36 • 5

codelion/optillm-router-dataset

Viewer • Updated Apr 12 • 2.81k • 92 • 6

codelion/Sky-T1_data_17k

Viewer • Updated Jan 11 • 16.4k • 43 • 1

codelion/worker-safety-qa-eval

Viewer • Updated Jun 20, 2024 • 34 • 124 • 4