view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand 5 days ago • 54
Toto: Time Series Optimized Transformer for Observability Paper • 2407.07874 • Published Jul 10, 2024 • 34
A decoder-only foundation model for time-series forecasting Paper • 2310.10688 • Published Oct 14, 2023 • 7
Running on CPU Upgrade Featured 2.55k The Smol Training Playbook 📚 2.55k The secrets to building world-class LLMs
Running Featured 179 Gradio Hackathon Registration Winter 25 📝 179 Gradio Agents & MCP Hackathon Winter 2025 Registration Page
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization Paper • 2509.23202 • Published Sep 27 • 27
Made with Jean Zay Collection Work performed using Jean Zay Supercomputer resources from GENCI-IDRIS • 4 items • Updated Oct 28