Global human motion given monocular camera assumptions: from known, static to unknown and moving

Habekost, Julian

Global human motion given monocular camera assumptions: from known, static to unknown and moving

Files

Habekost2024.pdf (69.46 MB)

Date

2024-06-13

Authors

Habekost, Julian

Full item page

Abstract

In this doctoral thesis, I present a body of work around estimating 3D global human motion from monocular videos under different camera assumptions by learning from motion capture data. The classical monocular 3D human pose estimation task is only concerned with root-relative poses, here called local poses. Local human poses do not traverse in space and are only of limited use for motion-capture-like applications, e.g. for a character in a game or animated movie. The relationship between local and global human poses is conceptually connected to the camera projection and its position or motion. Chapter 3 proposes a generative model based on adversarial learning that learns the projection of human motion of a known but unseen camera. We are the first to introduce a differentiable egocentrisation in order to embed global human motion into a neural prior. We show that this approach exceeds the performance of other camera domain adaptation methods by comparing them in the local pose space. We are the first to show that the model’s knowledge of the ground plane and the projection plane also improves the local 3D pose quality. In chapter 4 we learn a supervised model based on synthetically rendered humans in sequences of arbitrary length. If we can assume that the subject’s motion is on an unknown ground plane and we know that the camera is static but unknown, we show that we can infer human motion and even camera intrinsics and extrinsics. In chapter 5 we adapt a generative model based on a conditional variational autoencoder (cVAE) to enable the subject traversing terrain under an unknown moving camera. The moving camera estimation is supported with a classic feature matching visual optometry approach. We are the first to show that a neural model of global motion on terrain can enable and enhance the performance of simple feature matching based visual optometry. We record a large dataset of two subjects moving over obstacles and on the flat ground while being filmed with a handheld camera with different field of views. This allows us to analyse under which circumstances the model performs best, specifically with respect to the estimation of camera intrinsics and motion.

URI

https://hdl.handle.net/1842/41879
http://dx.doi.org/10.7488/era/4602

This item appears in the following Collection(s)

Informatics thesis and dissertation collection