Visual Prosody: Facial Movements Accompanying Speech
View/ Open
Date
05/2002Author
Graf, Hans Peter
Cosatto, Eric
Strom, Volker
Huang, Fu Jie
Metadata
Abstract
As we articulate speech, we usually move the head and exhibit various facial expressions. This visual aspect of speech aids understanding and helps communicating additional information, such as the speaker's mood. We analyze quantitatively head and facial movements that accompany speech and investigate how they relate to the text's prosodic structure. We recorded several hours of speech and measured the locations of the speakers' main facial features as well as their head poses. The text was evaluated with a prosody prediction tool, identifying phrase boundaries and pitch accents. Characteristic for most speakers are simple motion patterns that are repeatedly applied in synchrony with the main prosodic events. Direction and strength of head movements vary widely from one speaker to another, yet their timing is typically well synchronized with the spoken text. Understanding quantitatively the correlations between head movements and spoken text is important for synthesizing photo-realistic talking heads. Talking heads appear much more engaging when they exhibit realistic motion patterns