Transforming Voice Source Parameters in a HMM-based Speech Synthesiser with Glottal Post-Filtering
Proc. 7th ISCA Speech Synthesis Workshop (SSW7)
View/ Open
Date
2010Author
Cabral, Joao P
Renals, Steve
Richmond, Korin
Yamagishi, Junichi
Metadata
Abstract
Control over voice quality, e.g. breathy and tense voice, is important for speech synthesis applications. For example, transformations can be used to modify aspects of the voice re- lated to speaker's identity and to improve expressiveness. How- ever, it is hard to modify voice characteristics of the synthetic speech, without degrading speech quality. State-of-the-art sta- tistical speech synthesisers, in particular, do not typically al- low control over parameters of the glottal source, which are strongly correlated with voice quality. Consequently, the con- trol of voice characteristics in these systems is limited. In con- trast, the HMM-based speech synthesiser proposed in this paper uses an acoustic glottal source model. The system passes the glottal signal through a whitening filter to obtain the excitation of voiced sounds. This technique, called glottal post-filtering, allows to transform voice characteristics of the synthetic speech by modifying the source model parameters. We evaluated the proposed synthesiser in a perceptual ex- periment, in terms of speech naturalness, intelligibility, and similarity to the original speaker's voice. The results show that it performed as well as a HMM-based synthesiser, which generates the speech signal with a commonly used high-quality speech vocoder.