Edinburgh Research Archive

Synthesizing fundamental frequency using models automatically trained from data

dc.contributor.author
Dusterhoff, Kurt Edward
en
dc.date.accessioned
2019-02-15T14:21:30Z
dc.date.available
2019-02-15T14:21:30Z
dc.date.issued
2000
dc.description.abstract
en
dc.description.abstract
This thesis presents a methodology for use in building intonation synthesis models which are automatically trained from annotated speech data. The research investigates four subtopics: intonation synthesis, automatic intona¬ tion analysis, intonation evaluation, and interactions between intonation and speech segments (phones). The primary goal of this research is to produce stochastic models which can be used to generate fundamental frequency contours for synthetic ut¬ terances. The models produced are binary decision trees which are used to predict a parameterized description of fundamental frequency for an ut¬ terance. These models are trained using the sort of information which is typically available to a speech synthesizer during intonation generation. For example, the speech database is annotated with information about the loca¬ tion of word, phrase, segment, and syllable boundaries. The decision trees ask questions about such information. One obvious problem facing the stochastic modelling approach to into¬ nation synthesis models is obtaining data with the appropriate intonation annotation. This thesis presents a method by which such an annotation can be automatically derived for an utterance. The method uses Hidden Markov Models to label speech with intonation event boundaries given fundamental frequency, energy, and Mel frequency cepstral coefficients. Intonation events are fundamental frequency movements which relate to constituents larger than the syllable nucleus. Even if there is an abundance of fully labelled speech data, and the intona¬ tion synthesis models appear robust, it is important to produce an evaluation of the resulting intonation contours which allows comparison with other in5 tonation synthesis methods. Such an evaluation could be used to compare versions of the same basic methodology or completely different methodolo¬ gies. The question of intonation evaluation is addressed in this thesis in terms of system development. Objective methods of evaluating intonation contours are reviewed with regard to their ability to regularly provide feedback which can be used to improve the systems being evaluated. The fourth area investigated in this thesis is the interaction between seg¬ mental (phone) and suprasegmental (intonation) levels of speech. This in¬ vestigation is not undertaken separately from the other investigations. Ques¬ tions about phone-intonation interaction form a part of the research in both intonation synthesis and intonation analysis. The research in this thesis has resulted in a methodology which can be used to automatically train and evaluate stochastic models for intonation synthesis from automatically annotated speech databases.
en
dc.identifier.uri
http://hdl.handle.net/1842/33918
dc.publisher
The University of Edinburgh
en
dc.relation.ispartof
Annexe Thesis Digitisation Project 2019 Block 22
en
dc.relation.isreferencedby
Already catalogued
en
dc.title
Synthesizing fundamental frequency using models automatically trained from data
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 2 of 2
Name:
DusterhoffKE_2000redux.pdf
Size:
54.41 MB
Format:
Adobe Portable Document Format
Name:
DusterhoffKE_2000_Floppy.zip
Size:
42.47 KB
Format:
Unknown data format

This item appears in the following Collection(s)