Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation Paper • 2512.24100 • Published 12 days ago • 1
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges Paper • 2512.11362 • Published about 1 month ago • 21
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition Paper • 2503.06984 • Published Mar 10, 2025 • 5
Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents Paper • 2511.18685 • Published Nov 24, 2025 • 3