Speech Synthesis Without a Phone Inventory
Interspeech
dc.contributor.author | Aylett, Matthew | |
dc.contributor.author | King, Simon | |
dc.contributor.author | Yamagishi, Junichi | |
dc.date.accessioned | 2010-10-12T13:01:28Z | |
dc.date.available | 2010-10-12T13:01:28Z | |
dc.date.issued | 2009 | en |
dc.identifier.uri | http://hdl.handle.net/1842/3909 | |
dc.description.abstract | In speech synthesis the unit inventory is decided using phonological and phonetic expertise. This process is resource intensive and potentially sub-optimal. In this paper we investigate how acoustic clustering, together with lexicon constraints, can be used to build a self-organised inventory. Six English speech synthesis systems were built using two frameworks, unit selection and parametric HTS for three inventory conditions: 1) a traditional phone set, 2) a system using orthographic units, and 3) a self-organised inventory. A listening test showed a strong preference for the classic system, and for the orthographic system over the self-organised system. Results also varied by letter to sound complexity and database coverage. This suggests the self-organised approach failed to generalise pronunciation as well as introducing noise above and beyond that caused by orthographic sound mismatch. | en |
dc.title | Speech Synthesis Without a Phone Inventory | en |
dc.type | Conference Paper | en |
rps.title | Interspeech | en |
dc.date.updated | 2010-10-12T13:01:29Z | |
dc.date.openingDate | 2009 |