Lewis Tunstall's picture

In a Training Loop 🔄

Lewis Tunstall PRO

lewtun

·

https://lewtun.github.io/blog/

AI & ML interests

LLMs, LLMs, LLMs

Recent Activity

upvoted a paper about 11 hours ago

Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

liked a model 2 days ago

open-thoughts/OpenThinker-Agent-v1

liked a model 2 days ago

EssentialAI/rnj-1-instruct

View all activity

Organizations

upvoted a paper about 11 hours ago

Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Paper • 2508.09726 • Published Aug 13 • 15

upvoted 2 articles 3 days ago

Article

Yay! Organizations can now publish blog Articles

Jan 20

•

53

Article

We Got Claude to Fine-Tune an Open Source LLM

5 days ago

•

347

upvoted 3 papers 6 days ago

Kimi K2: Open Agentic Intelligence

Paper • 2507.20534 • Published Jul 28 • 8

The BrowserGym Ecosystem for Web Agent Research

Paper • 2412.05467 • Published Dec 6, 2024 • 23

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

Paper • 2511.07317 • Published 28 days ago • 13

upvoted an article 7 days ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

+2

8 days ago

•

225

upvoted a paper 11 days ago

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 137

upvoted an article 13 days ago

Article

Continuous batching from first principles

+1

14 days ago

•

254

upvoted an article 18 days ago

Article

Introducing Cogito v2.1

19 days ago

•

17

upvoted a collection 18 days ago

Cogito v2.1

2 items • Updated 19 days ago • 14

upvoted an article about 1 month ago

Article

Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation

Sep 16

•

15

upvoted 2 papers about 1 month ago

An efficient probabilistic hardware architecture for diffusion-like models

Paper • 2510.23972 • Published Oct 28 • 3

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29 • 44

upvoted 3 articles about 1 month ago

Article

3+ Years of ML & Society at Hugging Face 🤗🤝🧑‍🤝‍🧑

Oct 29

•

13

Article

huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning

+2

Oct 27

•

71

Article

Aligning to What? Rethinking Agent Generalization in MiniMax M2

Oct 30

•

27

upvoted a collection about 1 month ago

gpt-oss-safeguard

gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are safety reasoning models built-upon gpt-oss • 2 items • Updated Oct 29 • 58

upvoted 2 papers about 1 month ago

Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs

Paper • 2402.12030 • Published Feb 19, 2024 • 3

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 247