leoisufa commited on
Commit
6cdccef
·
verified ·
1 Parent(s): 0043817

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +169 -3
  3. assets/agent.jpg +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/agent.jpg filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,169 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <!-- Project Title -->
3
+ <h1>
4
+ MotionAgent: Fine-grained Controllable Video Generation via<br>
5
+ Motion Field Agent
6
+ </h1>
7
+ <!-- Conference Info -->
8
+ <p><em>International Conference on Computer Vision, ICCV 2025.</em></p>
9
+ <!-- Project Badges -->
10
+ <p>
11
+ <a href="https://arxiv.org/abs/2502.03207">
12
+ <img src="https://img.shields.io/badge/arXiv-2502.03207-b31b1b.svg" alt="arXiv"/>
13
+ </a>
14
+ <a href="https://huggingface.co/leoisufa/MotionAgent">
15
+ <img src="https://img.shields.io/badge/HuggingFace-Model-yellow.svg" alt="HuggingFace"/>
16
+ </a>
17
+ </p>
18
+ </div>
19
+
20
+
21
+ <div align="center">
22
+ <strong>Xinyao Liao<sup>1,2</sup></strong>,
23
+ <strong>Xianfang Zeng<sup>2</sup></strong>,
24
+ <strong>Liao Wang<sup>2</sup></strong>,
25
+ <strong>Gang Yu<sup>2*</sup></strong>,
26
+ <strong>Guosheng Lin<sup>1*</sup></strong>,
27
+ <strong>Chi Zhang<sup>3</sup></strong>
28
+ <br><br>
29
+ <b>
30
+ <sup>1</sup> Nanyang Technological University 
31
+ <sup>2</sup> StepFun 
32
+ <sup>3</sup> Westlake University
33
+ </b>
34
+ </div>
35
+
36
+ ## 🧩 Overview
37
+ <p align="center">
38
+ <img src="assets/agent.jpg" alt="Pipeline of Motion Field Agent" width="100%">
39
+ </p>
40
+
41
+ MotionAgent is a novel framework that enables **fine-grained motion control** for text-guided image-to-video generation. At its core is a **motion field agent** that parses motion information in text prompts and converts it into explicit *object trajectories* and *camera extrinsics*. These motion representations are analytically integrated into a unified optical flow, which conditions a diffusion-based image-to-video model to generate videos with precise and flexible motion control. An optional rethinking step further refines motion alignment by iteratively correcting the agent’s previous actions.
42
+
43
+ ## 🎥 Demo
44
+ <p align="center">
45
+ <a href="https://www.youtube.com/watch?v=O9WW2UpXsAI" target="_blank">
46
+ <img src="https://img.youtube.com/vi/O9WW2UpXsAI/maxresdefault.jpg"
47
+ alt="MotionAgent Demo Video"
48
+ width="80%"
49
+ style="max-width:900px; border-radius:10px; box-shadow:0 0 10px rgba(0,0,0,0.15);">
50
+ </a>
51
+ <br>
52
+ <em>Click the image above to watch the full video on YouTube 🎬</em>
53
+ </p>
54
+
55
+ ## 🛠️ Dependencies and Installation
56
+ Follow the steps below to set up **MotionAgent** and run the demo smoothly 💫
57
+ ### 🔹 1. Clone the Repository
58
+ Clone the official GitHub repository and enter the project directory:
59
+ ```bash
60
+ git clone https://github.com/leoisufa/MotionAgent.git
61
+ cd MotionAgent
62
+ ```
63
+ ### 🔹 2. Environment Setup
64
+ ```bash
65
+ # Create and activate conda environment
66
+ conda create -n motionagent python==3.10 -y
67
+ conda activate motionagent
68
+
69
+ # Install PyTorch with CUDA 12.4 support
70
+ pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124
71
+
72
+ # Install project dependencies
73
+ pip install -r requirements.txt
74
+ ```
75
+ ### 🔹 3. Install Grounded-Segment-Anything Dependencies
76
+ MotionAgent relies on external segmentation and grounding models.
77
+ Follow the steps below to install [Grounded-Segment-Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything):
78
+ ```bash
79
+ # Navigate to models directory
80
+ cd models
81
+
82
+ # Clone the Grounded-Segment-Anything repository
83
+ git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git
84
+
85
+ # Enter the cloned directory
86
+ cd Grounded-Segment-Anything
87
+
88
+ # Install Segment Anything
89
+ python -m pip install -e segment_anything
90
+
91
+ # Install Grounding DINO
92
+ pip install --no-build-isolation -e GroundingDINO
93
+ ```
94
+
95
+ ### 🔹 4. Install Metric3D Dependencies
96
+ MotionAgent relies on an external monocular depth estimation model.
97
+ Follow the steps below to install [Metric3D](https://github.com/YvanYin/Metric3D):
98
+ ```bash
99
+ # Navigate to models directory
100
+ cd models
101
+
102
+ # Clone the Grounded-Segment-Anything repository
103
+ git clone https://github.com/YvanYin/Metric3D.git
104
+ ```
105
+
106
+ ## 🧱 Download Models
107
+
108
+ To run **MotionAgent**, please download all pretrained and auxiliary models listed below, and organize them under the `ckpts/` directory as shown in the example structure.
109
+
110
+ ### 1️⃣ **Optical Flow ControlNet Weights**
111
+ Download from
112
+ 👉 [Hugging Face (MotionAgent)](https://huggingface.co/leoisufa/MotionAgent)
113
+ and place the files in ckpts/controlnet.
114
+
115
+ ### 2️⃣ **Stable Video Diffusion**
116
+ Download from
117
+ 👉 [Hugging Face (MOFA-Video-Hybrid)](https://huggingface.co/MyNiuuu/MOFA-Video-Hybrid/tree/main/ckpts/mofa/stable-video-diffusion-img2vid-xt-1-1)
118
+ and save the model to ckpts/stable-video-diffusion-img2vid-xt-1-1
119
+
120
+ ### 3️⃣ **Grounding DINO**
121
+ Download the grounding model checkpoint using the command below:
122
+ ```bash
123
+ wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
124
+ ```
125
+ Then place it directly under ckpts/groundingdino_swint_ogc.pth
126
+
127
+ ### 4️⃣ **Metric Depth Estimator**
128
+ Download from
129
+ 👉 [Hugging Face (Metric3d)](https://huggingface.co/onnx-community/metric3d-vit-small)
130
+ and place the files in ckpts/metric_depth_vit_small_800k.pth.
131
+
132
+ ### 5️⃣ **Segment Anything**
133
+ Download the segmentation model using:
134
+ ```bash
135
+ wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
136
+ ```
137
+ Then place it under ckpts/sam_vit_h_4b8939.pth.
138
+
139
+ After all downloads and installations, your ckpts folder should look like this:
140
+
141
+ ```shell
142
+ ckpts/
143
+ ├── controlnet/
144
+ ├── stable-video-diffusion-img2vid-xt-1-1/
145
+ ├── groundingdino_swint_ogc.pth
146
+ ├── metric_depth_vit_small_800k.pth
147
+ └── sam_vit_h_4b8939.pth
148
+ ```
149
+
150
+ ## 🚀 Running the Demos
151
+ ToDo
152
+
153
+ ## 🔗 BibTeX
154
+ If you find [MotionAgent](https://arxiv.org/abs/2502.03207) useful for your research and applications, please cite using this BibTeX:
155
+ ```BibTeX
156
+ @article{liao2025motionagent,
157
+ title={Motionagent: Fine-grained controllable video generation via motion field agent},
158
+ author={Liao, Xinyao and Zeng, Xianfang and Wang, Liao and Yu, Gang and Lin, Guosheng and Zhang, Chi},
159
+ journal={arXiv preprint arXiv:2502.03207},
160
+ year={2025}
161
+ }
162
+ ```
163
+
164
+ ## 🙏 Acknowledgements
165
+ We thank the following prior art for their excellent open source work:
166
+ - [MOFA-Video](https://github.com/MyNiuuu/MOFA-Video)
167
+ - [AppAgent](https://github.com/TencentQQGYLab/AppAgent)
168
+ - [Grounded-Segment-Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything)
169
+ - [Metric3D](https://github.com/YvanYin/Metric3D)
assets/agent.jpg ADDED

Git LFS Details

  • SHA256: 87b1ee92824ddb3614b88de5aef614af2441b8f7533cdd5d1d9c8ba49a97ef69
  • Pointer size: 131 Bytes
  • Size of remote file: 851 kB