LuxDiT: Lighting Estimation with Video Diffusion Transformer Paper • 2509.03680 • Published Sep 3 • 17
2D Gaussian Splatting with Semantic Alignment for Image Inpainting Paper • 2509.01964 • Published Sep 2 • 7
Lost in Embeddings: Information Loss in Vision-Language Models Paper • 2509.11986 • Published Sep 15 • 28
Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models Paper • 2509.12132 • Published Sep 15 • 6
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation Paper • 2510.08551 • Published Oct 9 • 32
LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal Paper • 2510.15868 • Published Oct 17 • 25
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives Paper • 2510.20822 • Published Oct 23 • 40
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story Paper • 2511.15210 • Published 27 days ago • 87
RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models Paper • 2510.10390 • Published Oct 12 • 3
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency Paper • 2311.02772 • Published Nov 5, 2023 • 8
Durian: Dual Reference-guided Portrait Animation with Attribute Transfer Paper • 2509.04434 • Published Sep 4 • 10
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Paper • 2509.09595 • Published Sep 11 • 48
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation Paper • 2509.00428 • Published Aug 30 • 17
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Paper • 2509.01644 • Published Sep 1 • 33
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Paper • 2509.08519 • Published Sep 10 • 128