Synthesizing fundamental frequency using models automatically trained from data
dc.contributor.author
Dusterhoff, Kurt Edward
en
dc.date.accessioned
2019-02-15T14:21:30Z
dc.date.available
2019-02-15T14:21:30Z
dc.date.issued
2000
dc.description.abstract
en
dc.description.abstract
This thesis presents a methodology for use in building intonation synthesis
models which are automatically trained from annotated speech data. The
research investigates four subtopics: intonation synthesis, automatic intona¬
tion analysis, intonation evaluation, and interactions between intonation and
speech segments (phones).
The primary goal of this research is to produce stochastic models which
can be used to generate fundamental frequency contours for synthetic ut¬
terances. The models produced are binary decision trees which are used
to predict a parameterized description of fundamental frequency for an ut¬
terance. These models are trained using the sort of information which is
typically available to a speech synthesizer during intonation generation. For
example, the speech database is annotated with information about the loca¬
tion of word, phrase, segment, and syllable boundaries. The decision trees
ask questions about such information.
One obvious problem facing the stochastic modelling approach to into¬
nation synthesis models is obtaining data with the appropriate intonation
annotation. This thesis presents a method by which such an annotation can
be automatically derived for an utterance. The method uses Hidden Markov
Models to label speech with intonation event boundaries given fundamental
frequency, energy, and Mel frequency cepstral coefficients. Intonation events
are fundamental frequency movements which relate to constituents larger
than the syllable nucleus.
Even if there is an abundance of fully labelled speech data, and the intona¬
tion synthesis models appear robust, it is important to produce an evaluation
of the resulting intonation contours which allows comparison with other in5
tonation synthesis methods. Such an evaluation could be used to compare
versions of the same basic methodology or completely different methodolo¬
gies. The question of intonation evaluation is addressed in this thesis in terms
of system development. Objective methods of evaluating intonation contours
are reviewed with regard to their ability to regularly provide feedback which
can be used to improve the systems being evaluated.
The fourth area investigated in this thesis is the interaction between seg¬
mental (phone) and suprasegmental (intonation) levels of speech. This in¬
vestigation is not undertaken separately from the other investigations. Ques¬
tions about phone-intonation interaction form a part of the research in both
intonation synthesis and intonation analysis.
The research in this thesis has resulted in a methodology which can be
used to automatically train and evaluate stochastic models for intonation
synthesis from automatically annotated speech databases.
en
dc.identifier.uri
http://hdl.handle.net/1842/33918
dc.publisher
The University of Edinburgh
en
dc.relation.ispartof
Annexe Thesis Digitisation Project 2019 Block 22
en
dc.relation.isreferencedby
Already catalogued
en
dc.title
Synthesizing fundamental frequency using models automatically trained from data
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
This item appears in the following Collection(s)

