Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
In a Training Loop 🔄
98.0
TFLOPS
72
122
261
Asankhaya Sharma
PRO
codelion
Follow
sal-hardin's profile picture
leixy's profile picture
AtakanTekparmak's profile picture
382 followers
·
21 following
http://asankhaya.github.io/
asankhaya
codelion
asankhaya
AI & ML interests
Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.
Recent Activity
reacted
to
their
post
with 🚀
3 days ago
Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing
reacted
to
their
post
with 👍
3 days ago
Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing
reacted
to
their
post
with 🔥
3 days ago
Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing
View all activity
Organizations
codelion
's datasets
38
Sort: Recently updated
codelion/synth-1B
Viewer
•
Updated
29 days ago
•
822k
•
232
codelion/synth-100M
Viewer
•
Updated
30 days ago
•
100k
•
82
codelion/synth-10M
Viewer
•
Updated
30 days ago
•
13.3k
•
117
codelion/finewiki-1B
Viewer
•
Updated
Nov 2
•
52.7k
•
253
•
2
codelion/finewiki-10M
Viewer
•
Updated
Nov 2
•
4.91k
•
538
•
2
codelion/finewiki-100M
Viewer
•
Updated
Nov 2
•
68k
•
93
•
2
codelion/fineweb-edu-1B
Viewer
•
Updated
Nov 2
•
970k
•
1.49k
•
6
codelion/fineweb-edu-100M
Viewer
•
Updated
Nov 2
•
115k
•
201
•
3
codelion/fineweb-edu-10M
Viewer
•
Updated
Nov 2
•
9.46k
•
241
•
2
codelion/dclm-baseline-1B
Viewer
•
Updated
Nov 2
•
774k
•
1.23k
•
4
codelion/dclm-baseline-100M
Viewer
•
Updated
Nov 2
•
77.2k
•
62
•
2
codelion/dclm-baseline-10M
Viewer
•
Updated
Nov 2
•
7.95k
•
123
•
2
codelion/finepdfs-1B
Viewer
•
Updated
Nov 2
•
186k
•
738
•
2
codelion/finepdfs-100M
Viewer
•
Updated
Nov 2
•
18.6k
•
42
•
2
codelion/finepdfs-10M
Viewer
•
Updated
Nov 2
•
7.54k
•
138
•
2
codelion/execution-world-model-dataset
Viewer
•
Updated
Oct 14
•
621
•
47
codelion/SimpleQA-Verified
Viewer
•
Updated
Sep 11
•
1k
•
200
•
1
codelion/ifeval-high-quality-dpo
Viewer
•
Updated
Sep 9
•
501
•
73
codelion/Qwen2.5-Coder-0.5B-Instruct-security-preference
Viewer
•
Updated
Aug 2
•
245
•
26
codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context
Viewer
•
Updated
Jul 20
•
400
•
42
codelion/Llama-3.2-1B-Instruct-magpie-tool-calling
Viewer
•
Updated
Jul 18
•
1.2k
•
44
•
1
codelion/Qwen3-0.6B-icm-dpo-pairs
Viewer
•
Updated
Jul 18
•
122
•
36
codelion/Qwen3-0.6B-icm
Viewer
•
Updated
Jul 18
•
500
•
49
•
1
codelion/gemma-3-1b-it-magpie-reasoning
Viewer
•
Updated
Jul 18
•
131
•
43
•
2
codelion/Qwen3-0.6B-magpie
Viewer
•
Updated
Jul 12
•
735
•
55
•
1
codelion/Qwen3-0.6B-pts-thought-anchors
Viewer
•
Updated
Jul 10
•
148
•
34
•
2
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-thought-anchors
Viewer
•
Updated
Jul 10
•
110
•
19
•
2
codelion/Qwen3-0.6B-pts-dpo-pairs
Viewer
•
Updated
May 19
•
681
•
40
•
2
codelion/Qwen3-0.6B-pts-steering-vectors
Viewer
•
Updated
May 19
•
1.38k
•
61
•
4
codelion/Qwen3-0.6B-pts
Viewer
•
Updated
May 19
•
1.38k
•
45
•
2
Previous
1
2
Next