Show simple item record

dc.contributor.advisorClark, Robert
dc.contributor.advisorAylett, Matthew
dc.contributor.authorAndersson, Johan Sebastian
dc.date.accessioned2014-06-03T10:21:58Z
dc.date.available2014-06-03T10:21:58Z
dc.date.issued2013-11-28
dc.identifier.urihttp://hdl.handle.net/1842/8891
dc.description.abstractConventional synthetic voices can synthesise neutral read aloud speech well. But, to make synthetic speech more suitable for a wider range of applications, the voices need to express more than just the word identity. We need to develop voices that can partake in a conversation and express, e.g. agreement, disagreement, hesitation, in a natural and believable manner. In speech synthesis there are currently two dominating frameworks: unit selection and HMM-based speech synthesis. Both frameworks utilise recordings of human speech to build synthetic voices. Despite the fact that the content of the recordings determines the segmental and prosodic phenomena that can be synthesised, surprisingly little research has been made on utilising the corpus to extend the limited behaviour of conventional synthetic voices. In this thesis we will show how natural sounding conversational characteristics can be added to both unit selection and HMM-based synthetic voices, by adding speech from a spontaneous conversation to the voices. We recorded a spontaneous conversation, and by manually transcribing and selecting utterances we obtained approximately two thousand utterances from it. These conversational utterances were rich in conversational speech phenomena, but they lacked the general coverage that allows unit selection and HMM-based synthesis techniques to synthesise high quality speech. Therefore we investigated a number of blending approaches in the synthetic voices, where the conversational utterances were augmented with conventional read aloud speech. The synthetic voices that contained conversational speech were contrasted with conventional voices without conversational speech. The perceptual evaluations showed that the conversational voices were generally perceived by listeners as having a more conversational style than the conventional voices. This conversational style was largely due to the conversational voices’ ability to synthesise utterances that contained conversational speech phenomena in a more natural manner than the conventional voices. Additionally, we conducted an experiment that showed that natural sounding conversational characteristics in synthetic speech can convey pragmatic information, in our case an impression of certainty or uncertainty, about a topic to a listener. The conclusion drawn is that the limited behaviour of conventional synthetic voices can be enriched by utilising conversational speech in both unit selection and HMM-based speech synthesis.en_US
dc.language.isoenen_US
dc.publisherThe University of Edinburghen_US
dc.relation.hasversionAndersson, S., Badino, L., Watts, O., and Aylett, M. (2008). The CSTR/CereProc Blizzard entry 2008: The inconvenient data. In The Blizzard Challenge, Brisbane, Australia.en_US
dc.relation.hasversionAndersson, S., Georgila, K., Traum, D., Aylett, M., and Clark, R. (2010a). Prediction and realisation of conversational characteristics by utilising spontaneous speech for unit selection. In Speech Prosody, volume 100116, pages 1–4, Chicago, U.S.A.en_US
dc.relation.hasversionAndersson, S., Yamagishi, J., and Clark, R. (2010b). Utilising spontaneous conversational speech in HMM-based speech synthesis. In SSW7, pages 173–178, Kyoto, Japan.en_US
dc.relation.hasversionAndersson, S., Yamagishi, J., and Clark, R. (2012). Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis. Speech Communication, 54(2):175–188.en_US
dc.relation.hasversionBadino, L., Andersson, S., Yamagishi, J., and Clark, R. (2009). Identification of contrast and its emphatic realisation in HMM based speech synthesis. In Interspeech, pages 520–523, Brighton, U.K.en_US
dc.subjectSpeech synthesisen_US
dc.subjectConversationen_US
dc.subjectUnit selectionen_US
dc.titleSynthesis and evaluation of conversational characteristics in speech synthesisen_US
dc.typeThesis or Dissertationen_US
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhD Doctor of Philosophyen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record