-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathindex.html
225 lines (201 loc) · 12.1 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
<!DOCTYPE html>
<html>
<head lang="en">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<title>PD-FGC</title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="./assets/bootstrap.min.css">
<link rel="stylesheet" href="./assets/font-awesome.min.css">
<link rel="stylesheet" href="./assets/codemirror.min.css">
<link rel="stylesheet" href="./assets/app.css">
</head>
<body>
<div class="container" id="main">
<div class="row">
<h1 class="col-md-20 text-center">
<br></br>
<b>UDE</b>: A Unified Driving Engine for Human Motion Generation<br>
<!-- <small>
CVPR 2022 (Oral Presentation)
</small> -->
</h1>
<hr style="margin-top:0px">
</div>
<div class="row">
<div class="col-md-12 text-center">
<ul class="list-inline">
<li>
<a href="https://zixiangzhou916.github.io/" style="font-size: 16px;">
Zixiang Zhou
</a>
<!-- <sup>1</sup> -->
</li>
<li>
<a href="https://sites.google.com/site/zjuwby/?pli=1" style="font-size: 16px;">
Baoyuan Wang
</a>
<!-- <sup>1</sup> -->
</li><br>
<a></a><br>
<li>
<!-- <sup>1</sup> -->
<a href="https://www.xiaoice.com/" style="font-size: 16px;">
Xiaobing.ai
</a>
</li>
</ul>
</div>
</div>
<div class="row">
<div class="col-md-4 col-md-offset-4 text-center">
<ul class="nav nav-pills nav-justified">
<li>
<a href="http://arxiv.org/abs/2211.16016">
<img src="./assets/paper-1.png" height="60px">
<h4><strong>Paper</strong></h4>
</a>
</li>
<li>
<!-- <a onClick="alert('Code coming soon!\nContact dengyu2008@hotmail.com for more details.')"> -->
<!-- <a href="https://github.com/dorniwang/PD-FGC"> -->
<a>
<img src="./assets/github.png" height="60px">
<h4><strong>Code (coming soon)</strong></h4>
</a>
</li>
</ul>
</div>
</div>
<div class="row">
<div class="col-md-12 col-md-offset-0 text-center">
<a>
<!-- <video style="width:100%;height:100%;" playsinline autoplay loop preload muted>
<source src="./files/cover.mp4" type="video/mp4">
</video> -->
<img src="./assets/teaser.png" class="img-responsive" alt="teaser"><br>
</a>
<p class="text-justify" style="font-size: 16px;">
Our shared Unified Driving Engine (UDE) can support both text-driven and audio-driven human motion generation. Left shows an example of a motion sequence driven by a text description while Right shows an example driven by a LA Hiphop music clip
</p>
<br></br>
<h2>
Abstract
</h2>
<hr style="margin-top:0px">
<p class="text-justify" style="font-size: 16px;">
Generating controllable and editable human motion sequences is a key challenge in 3D Avatar generation. It has been labor-intensive to generate and animate human motion for a long time until learning-based approaches have been developed and applied recently. However, these approaches are still task-specific or modality-specific. In this paper, we propose “UDE”, the first unified driving engine that enables generating human motion sequences from natural language or audio sequences. Specifically, UDE consists of the following key components: 1) a motion quantization module based on VQVAE that represents continuous motion sequence as discrete latent code, 2) a modality-agnostic transformer encoder that learns to map modality-aware driving signals to a joint space, and 3) a unified token transformer (GPT-like) network to predict the quantized latent code index in an auto-regressive manner. 4) a diffusion motion decoder that takes as input the motion tokens and decodes them into motion sequences with high diversity. We evaluate our method on HumanML3D and AIST++ benchmarks, and the experiment results demonstrate our method achieves state-of-the-art performance
</p>
</div>
</div>
<div class="row">
<div class="col-md-12 col-md-offset-0 text-center">
<br></br>
<h2>
Video
</h2>
<hr style="margin-top:0px">
<div class="text-center">
<div style="position:relative;padding-top:56.25%;">
<iframe src="https://www.youtube.com/embed/CaG1PTvzkxA" allowfullscreen=""
style="position:absolute;top:0;left:0;width:100%;height:100%;"></iframe>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-12 col-md-offset-0 text-center">
<br></br>
<h2>
Overview
</h2>
<hr style="margin-top:0px">
<img src="./assets/overview.png" class="img-responsive" alt="overview"><br>
<p class="text-justify" style="font-size: 16px;">
The overview of our method.
Our model consists of four key components. First, we train a codebook using VQ-VAE. For the codebook, each code represents a certain pattern of the motion sequence. Second, we introduce a ModalityAgnostic Transformer Encoder (MATE). It takes the input
of different modalities and transforms them into sequential embedding in one joint space. The third component
is a Unified Token Transformer (UTT). We feed it with sequential embedding obtained by MATE and predict the
motion token sequences in an auto-regressive manner. The fourth component is a Diffusion Motion Decoder (DMD).
Unlike recent works, which are modality-specific, our DMD is modality-agnostic. Given the motion token sequences, DMD encodes them to semantic-rich embedding
and then decodes them to motion sequences in continuous space by the reversed diffusion process
</p>
</div>
</div>
<!-- <div class="row">
<div class="col-md-12 col-md-offset-0">
<div class="text-center">
<h2>
Citation
</h2>
</div>
<hr style="margin-top:0px">
<div class="form-group col-md-12 col-md-offset-0">
<div class="CodeMirror cm-s-default CodeMirror-wrap" style="font-size: 16px;">
<div
style="overflow: hidden; position: relative; width: 3px; height: 0px; top: 4px; left: 4px; ">
<textarea autocorrect="off" autocapitalize="off" spellcheck="false"
style="position: absolute; padding: 0px; width: 1000px; height: 1em; outline: none;"
tabindex="0"></textarea></div>
<div class="CodeMirror-vscrollbar" cm-not-content="true">
<div style="min-width: 1px; height: 0px;"></div>
</div>
<div class="CodeMirror-hscrollbar" cm-not-content="true">
<div style="height: 100%; min-height: 1px; width: 0px;"></div>
</div>
<div class="CodeMirror-scrollbar-filler" cm-not-content="true"></div>
<div class="CodeMirror-gutter-filler" cm-not-content="true"></div>
<div class="CodeMirror-scroll" tabindex="-1">
<div class="CodeMirror-sizer"
style="margin-left: 0px; margin-bottom: -17px; border-right-width: 13px; min-height: 162px; padding-right: 0px; padding-bottom: 0px;">
<div style="position: relative; top: 0px;">
<div class="CodeMirror-lines">
<div style="position: relative; outline: none;">
<div class="CodeMirror-measure">AخA</div>
<div class="CodeMirror-measure"></div>
<div style="position: relative; z-index: 1;"></div>
<div class="CodeMirror-cursors">
<div class="CodeMirror-cursor"
style="left: 4px; top: 0px; height: 17.1406px;"> </div>
</div>
<div class="CodeMirror-code" style="">
<pre
class=" CodeMirror-line "><span style="padding-right: 0.1px;">@article{wang2022pdfgc,</span></pre>
<pre
class=" CodeMirror-line "><span style="padding-right: 0.1px;"> title={UDE: A Unified Driving Engine for Human Motion Generation },</span></pre>
<pre
class=" CodeMirror-line "><span style="padding-right: 0.1px;"> author={Wang, Duomin and Deng, Yu and Yin, Zixin and Shum, Heung-Yeung and Wang, Baoyuan},</span></pre>
<pre
class=" CodeMirror-line "><span style="padding-right: 0.1px;"> journal={arXiv:2211.14506},</span></pre>
<pre
class=" CodeMirror-line "><span style="padding-right: 0.1px;"> year={2022}</span></pre>
<pre
class=" CodeMirror-line "><span style="padding-right: 0.1px;">}</span></pre>
</div>
</div>
</div>
</div>
</div>
<div style="position: absolute; height: 13px; width: 1px; top: 280px;"></div>
<div class="CodeMirror-gutters" style="display: none; height: 300px;"></div>
</div>
</div>
</div>
</div>
</div> -->
<div class="row">
<div class="col-md-12 col-md-offset-0 text-center">
<br></br>
<h2>
Acknowledgements
</h2>
<hr style="margin-top:0px">
<p class="text-justify" style="font-size: 16px;">
<!-- We thank Harry Shum for the fruitful advice and discussion to improve the paper. <br> -->
The website template was adapted from <a href="https://yudeng.github.io/GRAM/">GRAM</a>.
</p>
</div>
</div>
</body>
</html>