Global human motion given monocular camera assumptions: from known, static to unknown and moving
dc.contributor.advisor
Fisher, Robert
dc.contributor.advisor
Komura, Taku
dc.contributor.advisor
Shiratori, Takaaki
dc.contributor.author
Habekost, Julian
dc.date.accessioned
2024-06-13T10:27:04Z
dc.date.available
2024-06-13T10:27:04Z
dc.date.issued
2024-06-13
dc.description.abstract
In this doctoral thesis, I present a body of work around estimating 3D global human motion from monocular videos under different camera assumptions by learning from motion capture data. The classical monocular 3D human pose estimation task is only concerned with root-relative poses, here called local poses. Local human poses do not traverse in space and are only of limited use for motion-capture-like applications, e.g. for a character in a game or animated movie. The relationship between local and global human poses is conceptually connected to the camera projection and its position or motion. Chapter 3 proposes a generative model based on adversarial learning that learns the projection of human motion of a known but unseen camera. We are the first to introduce a differentiable egocentrisation in order to embed global human motion into a neural prior. We show that this approach exceeds the performance of other camera domain adaptation methods by comparing them in the local pose space. We are the first to show that the model’s knowledge of the ground plane and the projection plane also improves the local 3D pose quality. In chapter 4 we learn a supervised model based on synthetically rendered humans in sequences of arbitrary length. If we can assume that the subject’s motion is on an unknown ground plane and we know that the camera is static but unknown, we show that we can infer human motion and even camera intrinsics and extrinsics. In chapter 5 we adapt a generative model based on a conditional variational autoencoder (cVAE) to enable the subject traversing terrain under an unknown moving camera. The moving camera estimation is supported with a classic feature matching visual optometry approach. We are the first to show that a neural model of global motion on terrain can enable and enhance the performance of simple feature matching based visual optometry. We record a large dataset of two subjects moving over obstacles and on the flat ground while being filmed with a handheld camera with different field of views. This allows us to analyse under which circumstances the model performs best, specifically with respect to the estimation of camera intrinsics and motion.
en
dc.identifier.uri
https://hdl.handle.net/1842/41879
dc.identifier.uri
http://dx.doi.org/10.7488/era/4602
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Julian Habekost, Takaaki Shiratori, Yuting Ye, Taku Komura: Learning 3D Global Human Motion Estimation from Unpaired, Disjoint Datasets. British Machine Vision Conference (BMVC) 2020.
en
dc.relation.hasversion
Habekost, J., Pang, K., Shiratori, T., and Komura, T. (2022). From synthetic to one shot regression of camera-agnostic human performances. In El Yacoubi, M., Granger, E., Yuen, P. C., Pal, U., and Vincent, N., editors, Pattern Recognition and Artificial Intelligence, pages 514–525, Cham. Springer International Publishing
en
dc.subject
3D global human motion
en
dc.subject
local human poses
en
dc.subject
differentiable egocentrisation
en
dc.subject
conditional variational autoencoder
en
dc.subject
cVAE
en
dc.subject
moving camera estimation
en
dc.subject
visual optometry
en
dc.title
Global human motion given monocular camera assumptions: from known, static to unknown and moving
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Habekost2024.pdf
- Size:
- 69.46 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

