Large Scale Speech Synthesis Evaluation
Item statusRestricted Access
In speech synthesis evaluation, it is critical that we know what exactly affects the results of the evaluation rather than employing as vague notions as, say, "good quality speech". As so far we have only been able to rely on people's subjective judgements, a deep understanding of the mechanisms behind those judgements might lead to, first, an improvement in the overall quality of synthetic speech that could now address precisely the points that users find relevant, secondly, to a better subjective evaluation method design, and third, to designing an objective method of evaluation, that would offer stable and reliable comparison across and within systems. We base our work on the data from a large scale speech synthesis evaluation challenge, the Blizzard Challenge 2007, and we use a Multidimensional Scaling technique to map the acoustic features that users pay attention to while evaluating synthetic speech onto the dimensions along which the systems differ. We then present the results of a perceptual experiment conducted to test the hypothesis. The final parts of the thesis offer a discussion of the results and suggest some new directions for speech synthesis that follow from our findings.