Automatic intonation analysis using acoustic data.
Date
1999Author
Dusterhoff, Kurt E
Metadata
Abstract
In a research world where many human-hours are spent labelling,
segmenting, checking, and rechecking various levels
of linguistic information, it is obvious that automatic
analysis can lower the costs (in time as well as funding) of
linguistic annotation. More importantly, automatic speech
analysis coupled with automatic speech generation allows
human-computer interaction to advance towards spoken dialogue.
Automatic intonation analysis can aid this advance
in both the speaker and hearer roles of computational dialogue.
Real-time intonation analysis can enable the use of
intonational cues in speech recognition and understanding
tasks. Auto-analysis of developmental speech databases allows
researchers to easily expand the range of data which
they model for intonation generation.
This paper presents a series of experiments which test the
use of acoustic data in the automatic detection of Tilt intonation
events. A set of speaker-dependent HMMs is used
to detect accents, boundaries, connections and silences. A
base result is obtained, following Taylor [8], by training
the models using fundamental frequency and RMS energy.
These base figures are then compared to a number of experiments
which augment the F0 and energy data with cepstral
coefficient data. In all cases, both the first and second
derivative of each feature are included. The best results
show a relative error reduction of 12% over the baseline.