Nikolai Mozgovoi's picture

86 4

Nikolai Mozgovoi

vonexel

·

vonexel

AI & ML interests

None yet

Recent Activity

upvoted a paper 9 days ago

TiDAR: Think in Diffusion, Talk in Autoregression

upvoted a paper 9 days ago

LuxDiT: Lighting Estimation with Video Diffusion Transformer

upvoted a paper 9 days ago

2D Gaussian Splatting with Semantic Alignment for Image Inpainting

View all activity

Organizations

None yet

upvoted 12 papers 9 days ago

TiDAR: Think in Diffusion, Talk in Autoregression

Paper • 2511.08923 • Published Nov 12 • 115

LuxDiT: Lighting Estimation with Video Diffusion Transformer

Paper • 2509.03680 • Published Sep 3 • 17

2D Gaussian Splatting with Semantic Alignment for Image Inpainting

Paper • 2509.01964 • Published Sep 2 • 7

Lost in Embeddings: Information Loss in Vision-Language Models

Paper • 2509.11986 • Published Sep 15 • 28

Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models

Paper • 2509.12132 • Published Sep 15 • 6

Human3R: Everyone Everywhere All at Once

Paper • 2510.06219 • Published Oct 7 • 10

ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

Paper • 2510.08551 • Published Oct 9 • 32

BLIP3o-NEXT: Next Frontier of Native Image Generation

Paper • 2510.15857 • Published Oct 17 • 24

LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal

Paper • 2510.15868 • Published Oct 17 • 25

RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published Oct 18 • 48

Accelerating Vision Transformers with Adaptive Patch Sizes

Paper • 2510.18091 • Published Oct 20 • 6

HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

Paper • 2510.20822 • Published Oct 23 • 40

upvoted a paper 19 days ago

Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story

Paper • 2511.15210 • Published 27 days ago • 87

upvoted a paper 26 days ago

RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models

Paper • 2510.10390 • Published Oct 12 • 3

upvoted 6 papers 3 months ago

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

Paper • 2311.02772 • Published Nov 5, 2023 • 8

Durian: Dual Reference-guided Portrait Animation with Attribute Transfer

Paper • 2509.04434 • Published Sep 4 • 10

Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis

Paper • 2509.09595 • Published Sep 11 • 48

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

Paper • 2509.00428 • Published Aug 30 • 17

OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning

Paper • 2509.01644 • Published Sep 1 • 33

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10 • 128