-
rain1011/pyramid-flow-miniflux
Text-to-Video • Updated • 176 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Paper • 2504.08685 • Published • 130 -
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Paper • 2504.12626 • Published • 51
Collections
Discover the best community collections!
Collections including paper arxiv:2503.09566
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
High-Quality Image Restoration Following Human Instructions
Paper • 2401.16468 • Published • 15 -
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding
Paper • 2401.15708 • Published • 12 -
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
Paper • 2401.14688 • Published • 13 -
TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
Paper • 2401.14828 • Published • 10
-
rain1011/pyramid-flow-miniflux
Text-to-Video • Updated • 176 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Paper • 2504.08685 • Published • 130 -
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Paper • 2504.12626 • Published • 51
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
-
High-Quality Image Restoration Following Human Instructions
Paper • 2401.16468 • Published • 15 -
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding
Paper • 2401.15708 • Published • 12 -
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
Paper • 2401.14688 • Published • 13 -
TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
Paper • 2401.14828 • Published • 10
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13