Show simple item record

dc.contributor.advisorClark, Roberten
dc.contributor.advisorAylett, Matthewen
dc.contributor.authorAndersson, Johan Sebastianen
dc.date.accessioned2014-06-03T10:21:58Z
dc.date.available2014-06-03T10:21:58Z
dc.date.issued2013-11-28
dc.identifier.urihttp://hdl.handle.net/1842/8891
dc.description.abstractConventional synthetic voices can synthesise neutral read aloud speech well. But, to make synthetic speech more suitable for a wider range of applications, the voices need to express more than just the word identity. We need to develop voices that can partake in a conversation and express, e.g. agreement, disagreement, hesitation, in a natural and believable manner. In speech synthesis there are currently two dominating frameworks: unit selection and HMM-based speech synthesis. Both frameworks utilise recordings of human speech to build synthetic voices. Despite the fact that the content of the recordings determines the segmental and prosodic phenomena that can be synthesised, surprisingly little research has been made on utilising the corpus to extend the limited behaviour of conventional synthetic voices. In this thesis we will show how natural sounding conversational characteristics can be added to both unit selection and HMM-based synthetic voices, by adding speech from a spontaneous conversation to the voices. We recorded a spontaneous conversation, and by manually transcribing and selecting utterances we obtained approximately two thousand utterances from it. These conversational utterances were rich in conversational speech phenomena, but they lacked the general coverage that allows unit selection and HMM-based synthesis techniques to synthesise high quality speech. Therefore we investigated a number of blending approaches in the synthetic voices, where the conversational utterances were augmented with conventional read aloud speech. The synthetic voices that contained conversational speech were contrasted with conventional voices without conversational speech. The perceptual evaluations showed that the conversational voices were generally perceived by listeners as having a more conversational style than the conventional voices. This conversational style was largely due to the conversational voices’ ability to synthesise utterances that contained conversational speech phenomena in a more natural manner than the conventional voices. Additionally, we conducted an experiment that showed that natural sounding conversational characteristics in synthetic speech can convey pragmatic information, in our case an impression of certainty or uncertainty, about a topic to a listener. The conclusion drawn is that the limited behaviour of conventional synthetic voices can be enriched by utilising conversational speech in both unit selection and HMM-based speech synthesis.en
dc.language.isoen
dc.publisherThe University of Edinburghen
dc.relation.hasversionAndersson, S., Badino, L., Watts, O., and Aylett, M. (2008). The CSTR/CereProc Blizzard entry 2008: The inconvenient data. In The Blizzard Challenge, Brisbane, Australia.en
dc.relation.hasversionAndersson, S., Georgila, K., Traum, D., Aylett, M., and Clark, R. (2010a). Prediction and realisation of conversational characteristics by utilising spontaneous speech for unit selection. In Speech Prosody, volume 100116, pages 1–4, Chicago, U.S.A.en
dc.relation.hasversionAndersson, S., Yamagishi, J., and Clark, R. (2010b). Utilising spontaneous conversational speech in HMM-based speech synthesis. In SSW7, pages 173–178, Kyoto, Japan.en
dc.relation.hasversionAndersson, S., Yamagishi, J., and Clark, R. (2012). Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis. Speech Communication, 54(2):175–188.en
dc.relation.hasversionBadino, L., Andersson, S., Yamagishi, J., and Clark, R. (2009). Identification of contrast and its emphatic realisation in HMM based speech synthesis. In Interspeech, pages 520–523, Brighton, U.K.en
dc.subjectSpeech synthesisen
dc.subjectConversationen
dc.subjectUnit selectionen
dc.titleSynthesis and evaluation of conversational characteristics in speech synthesisen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record