Informed Blending of Databases for Emotional Speech Synthesis
Item Status
Embargo End Date
Date
Abstract
The goal of this project was to build a unit selection voice
that could portray emotions with varying intensities. A suitable
definition of an emotion was developed along with a descriptive
framework that supported the work carried out. A single
speaker was recorded portraying happy and angry speaking
styles. Additionally a neutral database was also recorded. A
target cost function was implemented that chose units according
to emotion mark-up in the database. The Dictionary of Affect
supported the emotional target cost function by providing an
emotion rating for words in the target utterance. If a word was
particularly ’emotional’, units from that emotion were favoured.
In addition intensity could be varied which resulted in a bias to
select a greater number emotional units. A perceptual evaluation
was carried out and subjects were able to recognise reliably
emotions with varying amounts of emotional units present in the
target utterance.
This item appears in the following Collection(s)

