Speech-driven animation using multi-modal hidden Markov models
Hofer, Gregor Otto
The main objective of this thesis was the synthesis of speech synchronised motion, in particular head motion. The hypothesis that head motion can be estimated from the speech signal was confirmed. In order to achieve satisfactory results, a motion capture data base was recorded, a definition of head motion in terms of articulation was discovered, a continuous stream mapping procedure was developed, and finally the synthesis was evaluated. Based on previous research into non-verbal behaviour basic types of head motion were invented that could function as modelling units. The stream mapping method investigated in this thesis is based on Hidden Markov Models (HMMs), which employ modelling units to map between continuous signals. The objective evaluation of the modelling parameters confirmed that head motion types could be predicted from the speech signal with an accuracy above chance, close to 70%. Furthermore, a special type ofHMMcalled trajectoryHMMwas used because it enables synthesis of continuous output. However head motion is a stochastic process therefore the trajectory HMM was further extended to allow for non-deterministic output. Finally the resulting head motion synthesis was perceptually evaluated. The effects of the “uncanny valley” were also considered in the evaluation, confirming that rendering quality has an influence on our judgement of movement of virtual characters. In conclusion a general method for synthesising speech-synchronised behaviour was invented that can applied to a whole range of behaviours.