A Deep Learning Framework for Character Motion Synthesis and Editing
Daniel Holden∗
University of Edinburgh
Jun Saito†
Marza Animation Planet
Taku Komura‡
University of Edinburgh
Figure 1: Our framework allows the animator to synthesize character movements automatically from given trajectories.
Abstract
We present a framework to synthesize character movements based
on high level parameters, such that the produced movements re-
spect the manifold of human motion, trained on a large motion cap-
ture dataset. The learned motion manifold, which is represented by
the hidden units of a convolutional autoencoder, represents motion
data in sparse components which can be combined to produce a
wide range of complex movements. To map from high level param-
eters to the motion manifold, we stack a deep feedforward neural
network on top of the trained autoencoder. This network is trained
to produce realistic motion sequences from parameters such as a
curve over the terrain that the character should follow, or a target
location for punching and kicking. The feedforward control net-
work and the motion manifold are trained independently, allowing
the user to easily switch between feedforward networks according
to the desired interface, without re-training the motion manifold.
Once motion is generated it can be edited by performing optimiza-
tion in the space of the motion manifold. This allows for imposing
kinematic constraints, or transforming the style of the motion, while
ensuring the edited motion remains natural. As a result, the system
can produce smooth, high quality motion sequences without any
manual pre-processing of the training data.
Keywords: deep learning, convolutional neural networks, autoen-
coder, human motion, character animation, manifold learning
Concepts: •Computing methodologies → Motion capture;
1 Introduction
Data-driven motion synthesis allows animators to produce con-
vincing character movements from high level parameters. Such
approaches greatly help animation production as animators only
∗email:
[email protected] †
email:
[email protected] ‡
email:
[email protected] Publication rights licensed to ACM. ACM acknowledges that this contribu-
tion was authored or co-authored by an employee, contractor or affiliate of
a national government. As such, the Government retains a nonexclusive,
royalty-free right to publish or reproduce this article, or to allow others to
do so, for Government purposes only.
SIGGRAPH ’16 Technical Paper, July 24 - 28, 2016, Anaheim, CA,
ISBN: 978-1-4503-4279-7/16/07
DOI: http://dx.doi.org/10.1145/2897824.2925975
need to provide high level instructions rather than low level details
through keyframes. Various techniques that make use of large mo-
tion capture datasets and machine learning to parameterize motion
have been proposed in computer animation.
Most data-driven approaches currently available require a signifi-
cant amount of manual data preprocessing, including motion seg-
mentation, alignment, and labeling. A mistake at any stage can
easily result in a failure of the final animation. Such preprocess-
ing is therefore usually carefully performed through a significant
amount of human intervention, making sure the output movements
appear smooth and natural. This makes full automation difficult
and so often these systems require dedicated technical developers
to maintain.
In this paper, we propose a model of animation synthesis and edit-
ing based on a deep learning framework, which can automatically
learn an embedding of motion data in a non-linear manifold using a
large set of human motion data with no manual data preprocessing
or human intervention. We train a convolutional autoencoder on a
large motion database such that it can reproduce the motion data
given as input, as well as synthesize novel motion via interpola-
tion. This unsupervised non-linear manifold learning process does
not require any motion segmentation or alignment which makes the
process significantly easier than previous approaches. On top of
this autoencoder we stack another feedforward neural network that
maps high level parameters to low level human motion, as repre-
sented by the hidden units of the autoencoder. With this, users can
easily produce realistic human motion sequences from intuitive in-
puts such as a curve over some terrain that the character should fol-
low, or the trajectory of the end effectors for punching and kicking.
As the feedforward control network and the motion manifold are
trained independently, users can easily swap and re-train the feed-
forward network according to the desired interface. Our approach
is also inherently parallel, which makes it very fast to compute and
a good fit for mainstream animation packages.
We also propose techniques to edit the motion data in the space
of the motion manifold. The hidden units of the convolutional au-
toencoder represent the motion in a sparse and continuous fashion,
such that adjusting the data in this space preserves the naturalness
and smoothness of the motion, while still allowing complex move-
ments of the body to be reproduced. One demonstrative example of
this editing is to combine the style of one motion with the timing
of another by minimizing the difference in the Gram matrices of
the hidden units of the synthesized motion and that of the reference