Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
11.8
TFLOPS
13
6
54
Kshitij Thakkar
PRO
kshitijthakkar
Follow
agenticai54's profile picture
Fishtiks's profile picture
21world's profile picture
18 followers
·
80 following
Mandark-droid
kshitij-thakkar-2061b924
AI & ML interests
AI observability + MoE efficiency engineer. Building tools that make GenAI traceable, measurable, and production-ready.
Recent Activity
liked
a Space
5 days ago
julien-c/statement-of-purpose
published
an
article
7 days ago
Scaling Mixture of Experts: Architecture Search for Billion-Parameter Language Models
updated
a collection
7 days ago
Large MoE Architecture Search (1B-2B)
View all activity
Organizations
kshitijthakkar
's models
107
Sort: Recently updated
kshitijthakkar/moe-312m-114m-16x2-12L-baseline-390m
Updated
14 days ago
•
40
kshitijthakkar/moe-225m-108m-12x2-10L-baseline-330m
Updated
14 days ago
•
28
kshitijthakkar/moe-99m-70m-8x2-8L-tiny-200m-8exp
Updated
14 days ago
•
35
kshitijthakkar/moe-141m-89m-8x2-10L-small-250m-8exp
Updated
14 days ago
•
48
kshitijthakkar/moe-202m-104m-12x2-10L-medium-300m-12exp
Updated
14 days ago
•
40
kshitijthakkar/moe-241m-111m-12x2-12L-balanced-350m-12exp
Updated
14 days ago
•
26
kshitijthakkar/moe-353m-130m-16x2-12L-large-400m-16exp
Updated
14 days ago
•
37
kshitijthakkar/moe-415m-147m-16x2-12L-xlarge-450m-16exp
Updated
14 days ago
•
36
kshitijthakkar/moe-161m-123m-4x2-12L-4exp-large-experts
Updated
14 days ago
•
25
kshitijthakkar/moe-198m-114m-8x2-12L-8exp-balanced
Updated
14 days ago
•
36
kshitijthakkar/moe-340m-107m-24x2-12L-24exp-specialized
Updated
14 days ago
•
40
kshitijthakkar/moe-350m-102m-16x1-12L-top1-routing
Updated
14 days ago
•
38
kshitijthakkar/moe-274m-132m-16x4-12L-top4-routing
Updated
14 days ago
•
35
kshitijthakkar/moe-240m-103m-12x2-16L-deep-narrow-16l
Updated
14 days ago
•
38
kshitijthakkar/moe-270m-132m-12x2-8L-shallow-wide-8l
Updated
14 days ago
•
37
kshitijthakkar/moe-229m-111m-12x2-10L-full-attention-no-gqa
Updated
14 days ago
•
40
kshitijthakkar/moe-284m-119m-12x2-14L-aggressive-gqa-1kv
Updated
14 days ago
•
41
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr5e-06
Updated
14 days ago
•
37
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr1e-05
Updated
14 days ago
•
41
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr3e-05
Updated
14 days ago
•
38
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr5e-05
Updated
14 days ago
•
33
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr1e-04
Updated
14 days ago
•
34
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr2e-04
Updated
14 days ago
•
30
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr3e-04
Updated
14 days ago
•
32
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr5e-04
Updated
14 days ago
•
23
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr1e-03
Updated
14 days ago
•
27
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-bs2-ctx512
Updated
14 days ago
•
29
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-bs2-ctx1024
Updated
14 days ago
•
21
kshitijthakkar/loggenix-moe-0.4B-0.2A-sft-s3.1
Text Generation
•
0.4B
•
Updated
19 days ago
•
123
kshitijthakkar/loggenix-moe-0.6B-base-pt-cpt-test
Updated
Jan 5
Previous
1
2
3
4
Next