EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 4 days ago • 33
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper • 2512.04987 • Published 5 days ago • 69
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published 16 days ago • 247
How Far Are We from Genuinely Useful Deep Research Agents? Paper • 2512.01948 • Published 8 days ago • 50
How Far Are We from Genuinely Useful Deep Research Agents? Paper • 2512.01948 • Published 8 days ago • 50 • 2
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published 12 days ago • 168
Monet: Reasoning in Latent Visual Space Beyond Images and Language Paper • 2511.21395 • Published 13 days ago • 15
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs Paper • 2511.07250 • Published 29 days ago • 17
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs Paper • 2511.07250 • Published 29 days ago • 17 • 2
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Paper • 2510.24821 • Published Oct 28 • 37
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues Paper • 2510.17722 • Published Oct 20 • 19
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues Paper • 2510.17722 • Published Oct 20 • 19 • 2
IF-VidCap: Can Video Caption Models Follow Instructions? Paper • 2510.18726 • Published Oct 21 • 24 • 2
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published Oct 21 • 36