Auditory speaker recognition: a theoretical and experimental study
View/ Open
Date
1980Author
Brown, Roger S.
Metadata
Abstract
Speaker recognition is defined as the.ability to recognise
a speaker's identity on the basis of hearing a sample of his speech.
Previous approaches to the subject have concentrated on the experimental
manipulation in isolation of acoustic features of the speech signal.
The theoretical approach adopted here attempts to provide a
conceptual framework for speaker recognition, in which emphasis is laid
on auditory speaker recognition (as opposed to speaker recognition by
machine or by the visual examination of spectrograms ("voiceprints")).
The everyday use of speaker recognition is discussed in contrast to
the possible artificialities of experimental formats. The nature and
utilisation of phonetic speaker-characterising features of voice are
examined within the context of
(i) other levels of features (syntactic, semantic,
lexical , etc.)
(ii) other forms of indexical information (sex, age,
regional origin, social status, etc.) and
(iii) other identity characteristics (names, physical
appearance, etc.).
Attention is also focussed on two variables in the speaker recognition
process which have been relatively neglected by previous writers and
researchers:
(i) The nature and implications of differences in the
tasks which listeners perform. The culmination
of this discussion is a model in Boolean logic of
the decision-processes involved in speaker recognition.
(ii) The possible effects caused by differences in the
number, background and training of listeners.
The experimental approach adopted exploits the simultaneous
manipulation of parameters made possible by the use of synthetic speech.
The relative weighting rather than the absolute potentiality of parameters
as speaker-characterising features can thus be examined. Results from
voice similarity judgment experiments employing a factorial design
indicate that:
(i) the parameters of mean pitch, mean formant
position and formant bandwidth are important
for speaker recognition, and
(ii) despite overall performance differences in
judgments of similarity and difference, the
responses of the individual listeners show
comparable reactions to factorial changes.