World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty Paper • 2512.05927 • Published 4 days ago • 9
Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation Paper • 2512.03534 • Published 6 days ago • 18
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch Paper • 2512.02395 • Published 7 days ago • 43
CaptionQA: Is Your Caption as Useful as the Image Itself? Paper • 2511.21025 • Published 13 days ago • 25
World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models Paper • 2511.22787 • Published 12 days ago • 8
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published 20 days ago • 74
Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework Paper • 2511.13189 • Published 22 days ago • 38
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published 22 days ago • 25
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark Paper • 2511.13853 • Published 22 days ago • 34
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space Paper • 2511.10555 • Published 26 days ago • 60
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models Paper • 2511.08577 • Published 28 days ago • 104
Agent READMEs: An Empirical Study of Context Files for Agentic Coding Paper • 2511.12884 • Published 22 days ago • 5
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls Paper • 2511.09148 • Published 27 days ago • 16
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published 28 days ago • 194
Adaptive Multi-Agent Response Refinement in Conversational Systems Paper • 2511.08319 • Published 28 days ago • 40
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published Nov 7 • 52
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published Oct 27 • 96