Join Cost for Unit Selection Speech Synthesis

Vepa, Jithendra

Join Cost for Unit Selection Speech Synthesis

Simple item page

dc.contributor.advisor

King, Simon

en

dc.contributor.advisor

Taylor, Paul

en

dc.contributor.author

Vepa, Jithendra

en

dc.date.accessioned

2006-10-18T10:32:51Z

dc.date.available

2006-10-18T10:32:51Z

dc.date.issued

2004-07

dc.description.abstract

Undoubtedly, state-of-the-art unit selection-based concatenative speech systems produce very high quality synthetic speech. this is due to a large speech database containing many instances of each speech unit, with a varied and natural distribution of prosodic and spectral characteristics. the join cost, which measures how well two units can be joined together is one of the main criteria for selecting appropriate units from this large speech database. The ideal join cost is one that measures percieved discontinuity based on easily measurable spectral properties of the units being joined, inorder to ensure smooth and natural sounding synthetic speech. During first part of my research, I have investigated various spectrally based distance measures for use in computation of the join cost by designing a perceptual listening experiment. A variation to the usual perceptual test paradigm is proposed in this thesis by deliberately including a wide range of qualities of join in polysyllabic words. The test stimuli are obtained using a state-of-the-art unit-selection text-to-speech system: rVoice from Rhetorical Systems Ltd. Three spectral features Mel-frequency cepstral coefficients (MFCC), line spectral frequencies (LSF) and multiple centroid analysis (MCA) parameters and various statistical distances - Euclidean, Kullback-Leibler, Mahalanobis - are used to obtain distance measures. Based on the correlations between perceptual scores and these spectral distances. I proposed new spectral distance measures, which have good correlation with human perception to concatenation discontinuities. The second part of my research concentrates on combining join cost computation and the smoothing operation, which is required to disguise joins, by learning an underlying representation from the acoustic signal. In order to accomplish this task, I have chosen linear dynamic models (LDM), sometimes known as Kalman filters. Three different initialisation schemes are used prior to Expectation-Maximisation (KM) in LDM training. Once the models are trained, the join cost is computed based on the error between model predictions and actual observations. Analytical measures are derived based on the shape of this error plot. These measures and initialisation schemes are compared by computing correlations using the perceptual data.. The LDMs are also able to smooth the observations which are then used to synthesise speech. To evaluate the LDM smoothing operation, another listening test is performed where it is compared with the standard methods (simple linear interpolation). I have compared the best three join cost functions, chosen from the first and second parts of my research, subjectively using a listening test in the third part of my research. in this test, I also evaluated different smoothing methods: no smoothing, linear smoothing and smoothing achieved using LDMs.

en

dc.format.extent

1842877 bytes

en

dc.format.mimetype

application/pdf

en

dc.identifier.uri

http://hdl.handle.net/1842/1452

dc.language.iso

en

dc.publisher

The University of Edinburgh. College of Science and Engineering. School of Informatics

en

dc.subject.other

unit selection

en

dc.subject.other

join cost

en

dc.subject.other

speech synthesis

en

dc.subject.other

polysyllabic words

en

dc.subject.other

line specral frequencies

en

dc.subject.other

multiple centroid analysis

en

dc.subject.other

kalman filters

en

dc.title

Join Cost for Unit Selection Speech Synthesis

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: JithendraVepa_PhDthesis.pdf
Size:: 1.76 MB
Format:: Adobe Portable Document Format

Download

This item appears in the following Collection(s)

CSTR thesis and dissertation collection