Nonlinear analysis of speech from a synthesis perspective
With the emergence of nonlinear dynamical systems analysis over recent years it has become clear that conventional time domain and frequency domain approaches to speech synthesis may be far from optimal. Using state space reconstructions of the time domain speech signal it is, at least in theory, possible to investigate a number of invariant geometrical measures for the underlying system which give a more thorough understanding of the dynamics of the system and therefore the form that any model should take. This thesis introduces a number of nonlinear dynamical analysis tools which are then applied to a database of vowels to extract the underlying invariant geometrical properties. The results of this analysis are then applied, using ideas taken from nonlinear dynamics, to the problem of speech synthesis and a novel synthesis technique is described and demonstrated. The tools used for the analysis are time delay embedding, singular value decomposition, correlation dimension, local singular value analysis, Lyapunov spectra and short term prediction properties. Although there have been many papers written about these tools, and algorithms proposed, there are currently no generally accepted techniques, especially for the calculation of Lyapunov spectra in the presence of noise and data length limitations. This thesis introduces all of the above tools and looks in detail at Lyapunov exponents and two major novel modifications are proposed that are demonstrated to be more robust than conventional techniques. The novel robust techniques are applied to a large database of vowel sounds showing that the vowels tested show evidence of nonlinear, low-dimensional, non-chaotic behaviour. It is particularly the evidence of non-chaotic behaviour that is of importance from a synthesis point of view and is used in the final section of the thesis which introduces a novel synthesis technique. The synthesis technique, which is based on ideas taken from nonlinear dynamics theory is detailed and demonstrated showing that it is capable of high quality natural sounding speech.