SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices Paper • 2601.08303 • Published 3 days ago • 12
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving Paper • 2601.01528 • Published 11 days ago • 17
Orient Anything V2: Unifying Orientation and Rotation Understanding Paper • 2601.05573 • Published 7 days ago • 8
Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals Paper • 2601.05848 • Published 6 days ago • 14
VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction Paper • 2601.05966 • Published 6 days ago • 21
Guiding a Diffusion Transformer with the Internal Dynamics of Itself Paper • 2512.24176 • Published 16 days ago • 7
YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection Paper • 2512.23273 • Published 18 days ago • 13
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation Paper • 2512.23705 • Published 17 days ago • 44
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published 20 days ago • 58
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Paper • 2512.20605 • Published 23 days ago • 60
Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published 29 days ago • 31
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming Paper • 2512.21338 • Published 22 days ago • 21
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations Paper • 2512.21004 • Published 23 days ago • 12
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published 29 days ago • 93
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published 24 days ago • 63
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing Paper • 2512.17909 • Published 27 days ago • 36
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 28 days ago • 83