ML Optimization Papers
updated
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper
• 2501.09747
• Published
• 28
Tensor Product Attention Is All You Need
Paper
• 2501.06425
• Published
• 90
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Paper
• 2501.06842
• Published
• 16
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One
Vision Token
Paper
• 2501.03895
• Published
• 52
LTX-Video: Realtime Video Latent Diffusion
Paper
• 2501.00103
• Published
• 50
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper
• 2412.20993
• Published
• 36
Token-Budget-Aware LLM Reasoning
Paper
• 2412.18547
• Published
• 46
TRecViT: A Recurrent Video Transformer
Paper
• 2412.14294
• Published
• 13
iFormer: Integrating ConvNet and Transformer for Mobile Application
Paper
• 2501.15369
• Published
• 13
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for
Mixture-of-Experts Language Models
Paper
• 2501.12370
• Published
• 11
Return of the Encoder: Maximizing Parameter Efficiency for SLMs
Paper
• 2501.16273
• Published
• 5
Cost-Optimal Grouped-Query Attention for Long-Context LLMs
Paper
• 2503.09579
• Published
• 5
Streaming Video Question-Answering with In-context Video KV-Cache
Retrieval
Paper
• 2503.00540
• Published
• 3
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume
for Enhanced Video Understanding
Paper
• 2502.03183
• Published
• 5
OmniMamba: Efficient and Unified Multimodal Understanding and Generation
via State Space Models
Paper
• 2503.08686
• Published
• 19
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long
Video Comprehension
Paper
• 2503.08689
• Published
• 4
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference
Time by Leveraging Sparsity
Paper
• 2503.07677
• Published
• 86
LightGen: Efficient Image Generation through Knowledge Distillation and
Direct Preference Optimization
Paper
• 2503.08619
• Published
• 20
Adaptive Layer-skipping in Pre-trained LLMs
Paper
• 2503.23798
• Published
• 6