Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2312.15715

aMUSEd: An Open MUSE Reproduction

Paper • 2401.01808 • Published Jan 3, 2024 • 31
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3, 2024 • 28
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

Paper • 2401.00604 • Published Dec 31, 2023 • 6
LARP: Language-Agent Role Play for Open-World Games

Paper • 2312.17653 • Published Dec 24, 2023 • 33

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20

DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

Paper • 2312.13578 • Published Dec 21, 2023 • 29
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20
VidToMe: Video Token Merging for Zero-Shot Video Editing

Paper • 2312.10656 • Published Dec 17, 2023 • 11

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20

video segmentation

Tracking Anything with Decoupled Video Segmentation

Paper • 2309.03903 • Published Sep 7, 2023 • 29
ProPainter: Improving Propagation and Transformer for Video Inpainting

Paper • 2309.03897 • Published Sep 7, 2023 • 28
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 120

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Paper • 2505.23747 • Published May 29, 2025 • 69
VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 38
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 160

object segmentation

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20

Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

Paper • 2312.16171 • Published Dec 26, 2023 • 37
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20
Learning Vision from Models Rivals Learning Vision from Data

Paper • 2312.17742 • Published Dec 28, 2023 • 16

Intresting Papers

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

Paper • 2312.14878 • Published Dec 22, 2023 • 15
Unsupervised Universal Image Segmentation

Paper • 2312.17243 • Published Dec 28, 2023 • 20

aMUSEd: An Open MUSE Reproduction

Paper • 2401.01808 • Published Jan 3, 2024 • 31
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3, 2024 • 28
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

Paper • 2401.00604 • Published Dec 31, 2023 • 6
LARP: Language-Agent Role Play for Open-World Games

Paper • 2312.17653 • Published Dec 24, 2023 • 33

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Paper • 2505.23747 • Published May 29, 2025 • 69
VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 38
Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 160

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20

object segmentation

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20

DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation

Paper • 2312.13578 • Published Dec 21, 2023 • 29
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20
VidToMe: Video Token Merging for Zero-Shot Video Editing

Paper • 2312.10656 • Published Dec 17, 2023 • 11

Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

Paper • 2312.16171 • Published Dec 26, 2023 • 37
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20
Learning Vision from Models Rivals Learning Vision from Data

Paper • 2312.17742 • Published Dec 28, 2023 • 16

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20

Intresting Papers

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

Paper • 2312.14878 • Published Dec 22, 2023 • 15
Unsupervised Universal Image Segmentation

Paper • 2312.17243 • Published Dec 28, 2023 • 20

video segmentation

Tracking Anything with Decoupled Video Segmentation

Paper • 2309.03903 • Published Sep 7, 2023 • 29
ProPainter: Improving Propagation and Transformer for Video Inpainting

Paper • 2309.03897 • Published Sep 7, 2023 • 28
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Paper • 2312.15715 • Published Dec 25, 2023 • 20
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 120

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs