Informed Blending of Databases for Emotional Speech Synthesis

Hofer, Gregor O; Richmond, Korin; Clark, Robert A J

Informed Blending of Databases for Emotional Speech Synthesis

Files

hofer_emosyn.pdf (263.48 KB)

hofer_emosyn.ps (2.13 MB)

Date

Authors

Abstract

The goal of this project was to build a unit selection voice that could portray emotions with varying intensities. A suitable definition of an emotion was developed along with a descriptive framework that supported the work carried out. A single speaker was recorded portraying happy and angry speaking styles. Additionally a neutral database was also recorded. A target cost function was implemented that chose units according to emotion mark-up in the database. The Dictionary of Affect supported the emotional target cost function by providing an emotion rating for words in the target utterance. If a word was particularly ’emotional’, units from that emotion were favoured. In addition intensity could be varied which resulted in a bias to select a greater number emotional units. A perceptual evaluation was carried out and subjects were able to recognise reliably emotions with varying amounts of emotional units present in the target utterance.

URI

http://www.isca-speech.org/archive/interspeech_2005
http://hdl.handle.net/1842/920

This item appears in the following Collection(s)

CSTR publications