Parameter tuning for unit selection speech synthesis
View/ Open
Date
2005Item status
Restricted AccessAuthor
Keating, Joanna
Metadata
Abstract
This project aims to contribute to current research on the quality of speech synthesis
by conducting a perceptual experiment to discover a better set of target
cost weights for the Festival speech synthesis system. From the experiment,
the acoustic parameters that listeners use when judging synthetic speech will
become clearer, as will the importance that each parameter has.
The project uses unit selection synthesis, which chooses units for concatenation
using a series of target and join costs. Each cost is assigned a weight value
which indicates its importance in the overall cost. This project manipulates the
target cost weight values in order to find a set of values that better represents
the listeners' perception of the quality of the synthetic speech.
Previous research shows that perceptual experiments are a common way of
evaluating the quality of speech synthesis, and this project uses a listening experiment
consisting of paired comparisons to reveal information about how
listeners judge synthetic speech. The results from the experiment were analysed
using multidimensional scaling to show the structure of the data and
provide insight into the processes involved in speech perception.
The results showed that when judging synthetic speech, participants pay attention
to position in phrase, position in syllable, and stress parameters. It was
also found that participants grouped the stimuli on the basis of which of these
parameters was given the weight value of 1. The results also showed that a
lack of weight on these parameters has more effect on the selection of units
from the database than a large amount of weight. Through analysis of the results
it was shown that position in syllable was the most important parameter
for high quality speech.