Nonlinear analysis of speech from a synthesis perspective
View/ Open
Date
06/1996Author
Banbrook, Michael
Metadata
Abstract
With the emergence of nonlinear dynamical systems analysis over recent years it has
become clear that conventional time domain and frequency domain approaches to
speech synthesis may be far from optimal. Using state space reconstructions of the
time domain speech signal it is, at least in theory, possible to investigate a number of
invariant geometrical measures for the underlying system which give a more thorough
understanding of the dynamics of the system and therefore the form that any model
should take. This thesis introduces a number of nonlinear dynamical analysis tools
which are then applied to a database of vowels to extract the underlying invariant
geometrical properties. The results of this analysis are then applied, using ideas taken
from nonlinear dynamics, to the problem of speech synthesis and a novel synthesis
technique is described and demonstrated.
The tools used for the analysis are time delay embedding, singular value decomposition,
correlation dimension, local singular value analysis, Lyapunov spectra and short
term prediction properties. Although there have been many papers written about
these tools, and algorithms proposed, there are currently no generally accepted techniques,
especially for the calculation of Lyapunov spectra in the presence of noise
and data length limitations. This thesis introduces all of the above tools and looks in
detail at Lyapunov exponents and two major novel modifications are proposed that
are demonstrated to be more robust than conventional techniques.
The novel robust techniques are applied to a large database of vowel sounds showing
that the vowels tested show evidence of nonlinear, low-dimensional, non-chaotic behaviour.
It is particularly the evidence of non-chaotic behaviour that is of importance
from a synthesis point of view and is used in the final section of the thesis which
introduces a novel synthesis technique. The synthesis technique, which is based on
ideas taken from nonlinear dynamics theory is detailed and demonstrated showing
that it is capable of high quality natural sounding speech.