DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation Paper • 2511.23127 • Published 11 days ago • 42
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models Paper • 2511.13704 • Published 22 days ago • 42
Go with Your Gut: Scaling Confidence for Autoregressive Image Generation Paper • 2509.26376 • Published Sep 30 • 8
FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning Paper • 2509.11796 • Published Sep 15
Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation Paper • 2508.10858 • Published Aug 14
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance Paper • 2505.13437 • Published May 19 • 6
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published Apr 17 • 20
Temporal Regularization Makes Your Video Generator Stronger Paper • 2503.15417 • Published Mar 19 • 22
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Paper • 2503.08619 • Published Mar 11 • 20
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization Paper • 2501.01245 • Published Jan 2 • 5
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding Paper • 2408.16272 • Published Aug 29, 2024
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web Paper • 2310.18340 • Published Oct 22, 2023
CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning Paper • 2404.09640 • Published Apr 15, 2024
OmniCreator: Self-Supervised Unified Generation with Universal Editing Paper • 2412.02114 • Published Dec 3, 2024 • 14
GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting Paper • 2405.07472 • Published May 13, 2024
FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs Paper • 2407.02157 • Published Jul 2, 2024