Neural motion synthesis of locomotion, interaction, and manipulation
Abstract
Motion synthesis is a task to automatically generate realistic movements of characters according to the user instructions. To produce high-quality character animations,
data-driven approaches that learn from motion capture data have been widely adopted.
Promoted by the recent progress of deep neural networks, groundbreaking success has
also been achieved when combining deep learning techniques with the motion capture
data. Compared with traditional methods, deep neural network-based approaches can
synthesize controllable and high-quality motions with lower computing and memory
costs. While carrying these advantages, existing neural network-based approaches are
mainly targeted to locomotion control of humanoid characters. Meanwhile, there are
still many open challenges when synthesizing (1) non-human character motions, e.g.,
those of quadrupeds and (2) complex human motions such as interaction with the environment and manipulation of objects. For these reasons, in this thesis, we aim to
further investigate deep learning techniques for synthesizing complex motions including quadruped locomotion, humanoid-environment interactions and dexterous hand
manipulation.
We first introduce a Mode-Adaptive Neural Network structure for controlling
quadruped characters. This framework is composed of the motion prediction network
and the gating network. At each frame, the motion prediction network computes the
character state in the current frame given the state in the previous frame and the user-provided control signals. The gating network dynamically updates the weights of the
motion prediction network by selecting and blending a group of expert weights. Based
on such a mechanism, a specific combination of the experts only needs to focus on
learning a small subset of motions. Due to the increased flexibility, this framework can
be trained with unstructured dog motion capture data and synthesize quadruped motions with a wide variety of locomotion modes across both non-periodic and periodic
actions in real-time.
Secondly, we present a goal-directed controller Neural State Machine that can
produce both locomotion and close interaction motions between humanoid characters
and objects/environments. In addition to utilizing the gating structure proposed in the
previous work, we apply a control scheme that combines egocentric inference and
goal-centric inference to increase the precision of the interactions. To let characters
adapt to a wide range of geometry, we incorporate a volumetric representation for
understanding the environment and an efficient data augmentation scheme to randomly
switch the 3D geometry while maintaining the context of the original motion. We
demonstrate the versatility of our model with various scene interaction tasks such as
sitting on a chair, avoiding obstacles, opening and entering through a door, and carrying
objects generated in real-time just from a single model.
Finally, we tackle the problem of in-hand manipulations where a ManipNet framework combined with a novel hand-object spatial representation is introduced to synthesize dexterous finger movements. The hand-object spatial representation combines
the global object shape as voxel occupancies with local geometric details as samples
of closest distances. At each frame, we provide the network with the current finger
pose, past and future trajectories, and the spatial representations extracted from these
trajectories. The network then predicts a new finger pose for the next frame as an autoregressive model. With a carefully chosen hand-centric coordinate system, we can
handle single-handed and two-handed motions in a unified framework. We demonstrate the network is able to synthesize a variety of finger gaits for grasping, in-hand
manipulation, and bimanual object handling on a rich set of novel shapes and functional tasks by only training on a small number of primitive shapes and kitchenware
objects.