Asankhaya Sharma's picture

In a Training Loop 🔄

Asankhaya Sharma PRO

codelion

·

http://asankhaya.github.io/

AI & ML interests

Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.

Recent Activity

reacted to their post with 🚀 3 days ago

Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing

reacted to their post with 👍 3 days ago

Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing

reacted to their post with 🔥 3 days ago

Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing

View all activity

Organizations

codelion 's datasets 38

codelion/synth-1B

Viewer • Updated 29 days ago • 822k • 232

codelion/synth-100M

Viewer • Updated 30 days ago • 100k • 82

codelion/synth-10M

Viewer • Updated 30 days ago • 13.3k • 117

codelion/finewiki-1B

Viewer • Updated Nov 2 • 52.7k • 253 • 2

codelion/finewiki-10M

Viewer • Updated Nov 2 • 4.91k • 538 • 2

codelion/finewiki-100M

Viewer • Updated Nov 2 • 68k • 93 • 2

codelion/fineweb-edu-1B

Viewer • Updated Nov 2 • 970k • 1.49k • 6

codelion/fineweb-edu-100M

Viewer • Updated Nov 2 • 115k • 201 • 3

codelion/fineweb-edu-10M

Viewer • Updated Nov 2 • 9.46k • 241 • 2

codelion/dclm-baseline-1B

Viewer • Updated Nov 2 • 774k • 1.23k • 4

codelion/dclm-baseline-100M

Viewer • Updated Nov 2 • 77.2k • 62 • 2

codelion/dclm-baseline-10M

Viewer • Updated Nov 2 • 7.95k • 123 • 2

codelion/finepdfs-1B

Viewer • Updated Nov 2 • 186k • 738 • 2

codelion/finepdfs-100M

Viewer • Updated Nov 2 • 18.6k • 42 • 2

codelion/finepdfs-10M

Viewer • Updated Nov 2 • 7.54k • 138 • 2

codelion/execution-world-model-dataset

Viewer • Updated Oct 14 • 621 • 47

codelion/SimpleQA-Verified

Viewer • Updated Sep 11 • 1k • 200 • 1

codelion/ifeval-high-quality-dpo

Viewer • Updated Sep 9 • 501 • 73

codelion/Qwen2.5-Coder-0.5B-Instruct-security-preference

Viewer • Updated Aug 2 • 245 • 26

codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context

Viewer • Updated Jul 20 • 400 • 42

codelion/Llama-3.2-1B-Instruct-magpie-tool-calling

Viewer • Updated Jul 18 • 1.2k • 44 • 1

codelion/Qwen3-0.6B-icm-dpo-pairs

Viewer • Updated Jul 18 • 122 • 36

codelion/Qwen3-0.6B-icm

Viewer • Updated Jul 18 • 500 • 49 • 1

codelion/gemma-3-1b-it-magpie-reasoning

Viewer • Updated Jul 18 • 131 • 43 • 2

codelion/Qwen3-0.6B-magpie

Viewer • Updated Jul 12 • 735 • 55 • 1

codelion/Qwen3-0.6B-pts-thought-anchors

Viewer • Updated Jul 10 • 148 • 34 • 2

codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-thought-anchors

Viewer • Updated Jul 10 • 110 • 19 • 2

codelion/Qwen3-0.6B-pts-dpo-pairs

Viewer • Updated May 19 • 681 • 40 • 2

codelion/Qwen3-0.6B-pts-steering-vectors

Viewer • Updated May 19 • 1.38k • 61 • 4

codelion/Qwen3-0.6B-pts

Viewer • Updated May 19 • 1.38k • 45 • 2