Dynamic Bayesian Network-based Speech Synthesis
Item statusRestricted Access
As the simplest version of dynamic Bayesian network (DBN), hidden Markov model (HMM) has its natural limits in speech synthesis in terms of explicit segmental and suprasegmental prosodic properties modelling (e.g. phone duration, syllable duration, F0 contour at syllable level, etc.). In stead of continuing to explore new “add-ons” for the existing HMM-based speech synthesis system, this dissertation makes a new attempt by doing speech synthesis under the complete DBN framework. The Graphical Models toolkit (GMTK) is used to implement such a novel system. As described in the dissertation, the DBN-based speech synthesis prototype system is a self-contained one, i.e. all the features are modelled within a standard DBN. The dissertation first introduces the development and current issues of speech synthesis, then explains HMM and its relation with (dynamic) Bayesian networks. After the theory part, the dissertation describes and illustrates in detail how the DBN-based system is built and how phone duration is explicitly modelled step by step. Finally, a brief object evaluation is conducted. Since the current DBN-based system is just a prototype, many problems are existed. However, such a DBN-based system does direct us a bright future.