Neural motion synthesis of locomotion, interaction, and manipulation

Zhang, He

Neural motion synthesis of locomotion, interaction, and manipulation

Files

Zhang2022.pdf (133.91 MB)

Date

2022-10-04

Authors

Zhang, He

Full item page

Abstract

Motion synthesis is a task to automatically generate realistic movements of characters according to the user instructions. To produce high-quality character animations, data-driven approaches that learn from motion capture data have been widely adopted. Promoted by the recent progress of deep neural networks, groundbreaking success has also been achieved when combining deep learning techniques with the motion capture data. Compared with traditional methods, deep neural network-based approaches can synthesize controllable and high-quality motions with lower computing and memory costs. While carrying these advantages, existing neural network-based approaches are mainly targeted to locomotion control of humanoid characters. Meanwhile, there are still many open challenges when synthesizing (1) non-human character motions, e.g., those of quadrupeds and (2) complex human motions such as interaction with the environment and manipulation of objects. For these reasons, in this thesis, we aim to further investigate deep learning techniques for synthesizing complex motions including quadruped locomotion, humanoid-environment interactions and dexterous hand manipulation. We first introduce a Mode-Adaptive Neural Network structure for controlling quadruped characters. This framework is composed of the motion prediction network and the gating network. At each frame, the motion prediction network computes the character state in the current frame given the state in the previous frame and the user-provided control signals. The gating network dynamically updates the weights of the motion prediction network by selecting and blending a group of expert weights. Based on such a mechanism, a specific combination of the experts only needs to focus on learning a small subset of motions. Due to the increased flexibility, this framework can be trained with unstructured dog motion capture data and synthesize quadruped motions with a wide variety of locomotion modes across both non-periodic and periodic actions in real-time. Secondly, we present a goal-directed controller Neural State Machine that can produce both locomotion and close interaction motions between humanoid characters and objects/environments. In addition to utilizing the gating structure proposed in the previous work, we apply a control scheme that combines egocentric inference and goal-centric inference to increase the precision of the interactions. To let characters adapt to a wide range of geometry, we incorporate a volumetric representation for understanding the environment and an efficient data augmentation scheme to randomly switch the 3D geometry while maintaining the context of the original motion. We demonstrate the versatility of our model with various scene interaction tasks such as sitting on a chair, avoiding obstacles, opening and entering through a door, and carrying objects generated in real-time just from a single model. Finally, we tackle the problem of in-hand manipulations where a ManipNet framework combined with a novel hand-object spatial representation is introduced to synthesize dexterous finger movements. The hand-object spatial representation combines the global object shape as voxel occupancies with local geometric details as samples of closest distances. At each frame, we provide the network with the current finger pose, past and future trajectories, and the spatial representations extracted from these trajectories. The network then predicts a new finger pose for the next frame as an autoregressive model. With a carefully chosen hand-centric coordinate system, we can handle single-handed and two-handed motions in a unified framework. We demonstrate the network is able to synthesize a variety of finger gaits for grasping, in-hand manipulation, and bimanual object handling on a rich set of novel shapes and functional tasks by only training on a small number of primitive shapes and kitchenware objects.

URI

https://hdl.handle.net/1842/39406
http://dx.doi.org/10.7488/era/2656

This item appears in the following Collection(s)

Informatics thesis and dissertation collection