Synthesizing fundamental frequency using models automatically trained from data

Dusterhoff, Kurt Edward

Synthesizing fundamental frequency using models automatically trained from data

Files

DusterhoffKE_2000redux.pdf (54.41 MB)

DusterhoffKE_2000_Floppy.zip (42.47 KB)

Date

2000

Authors

Dusterhoff, Kurt Edward

Full item page

Abstract

This thesis presents a methodology for use in building intonation synthesis models which are automatically trained from annotated speech data. The research investigates four subtopics: intonation synthesis, automatic intona¬ tion analysis, intonation evaluation, and interactions between intonation and speech segments (phones). The primary goal of this research is to produce stochastic models which can be used to generate fundamental frequency contours for synthetic ut¬ terances. The models produced are binary decision trees which are used to predict a parameterized description of fundamental frequency for an ut¬ terance. These models are trained using the sort of information which is typically available to a speech synthesizer during intonation generation. For example, the speech database is annotated with information about the loca¬ tion of word, phrase, segment, and syllable boundaries. The decision trees ask questions about such information. One obvious problem facing the stochastic modelling approach to into¬ nation synthesis models is obtaining data with the appropriate intonation annotation. This thesis presents a method by which such an annotation can be automatically derived for an utterance. The method uses Hidden Markov Models to label speech with intonation event boundaries given fundamental frequency, energy, and Mel frequency cepstral coefficients. Intonation events are fundamental frequency movements which relate to constituents larger than the syllable nucleus. Even if there is an abundance of fully labelled speech data, and the intona¬ tion synthesis models appear robust, it is important to produce an evaluation of the resulting intonation contours which allows comparison with other in5 tonation synthesis methods. Such an evaluation could be used to compare versions of the same basic methodology or completely different methodolo¬ gies. The question of intonation evaluation is addressed in this thesis in terms of system development. Objective methods of evaluating intonation contours are reviewed with regard to their ability to regularly provide feedback which can be used to improve the systems being evaluated. The fourth area investigated in this thesis is the interaction between seg¬ mental (phone) and suprasegmental (intonation) levels of speech. This in¬ vestigation is not undertaken separately from the other investigations. Ques¬ tions about phone-intonation interaction form a part of the research in both intonation synthesis and intonation analysis. The research in this thesis has resulted in a methodology which can be used to automatically train and evaluate stochastic models for intonation synthesis from automatically annotated speech databases.

URI

http://hdl.handle.net/1842/33918

This item appears in the following Collection(s)

Linguistics and English Language PhD thesis collection