Automatic Speech Recognition for ageing voices
dc.contributor.advisor
Renals, Steve
en
dc.contributor.author
Vipperla, Ravichander
en
dc.contributor.sponsor
Scottish Funding Council
en
dc.contributor.sponsor
University of Edinburgh
en
dc.date.accessioned
2012-01-18T09:56:31Z
dc.date.available
2012-01-18T09:56:31Z
dc.date.issued
2011-11-24
dc.description.abstract
With ageing, human voices undergo several changes which are typically characterised
by increased hoarseness, breathiness, changes in articulatory patterns and slower speaking
rate. The focus of this thesis is to understand the impact of ageing on Automatic
Speech Recognition (ASR) performance and improve the ASR accuracies for older
voices.
Baseline results on three corpora indicate that the word error rates (WER) for older
adults are significantly higher than those of younger adults and the decrease in accuracies
is higher for males speakers as compared to females.
Acoustic parameters such as jitter and shimmer that measure glottal source disfluencies
were found to be significantly higher for older adults. However, the hypothesis
that these changes explain the differences in WER for the two age groups is proven incorrect.
Experiments with artificial introduction of glottal source disfluencies in speech
from younger adults do not display a significant impact on WERs. Changes in fundamental
frequency observed quite often in older voices has a marginal impact on ASR
accuracies.
Analysis of phoneme errors between younger and older speakers shows a pattern
of certain phonemes especially lower vowels getting more affected with ageing. These
changes however are seen to vary across speakers. Another factor that is strongly associated
with ageing voices is a decrease in the rate of speech. Experiments to analyse
the impact of slower speaking rate on ASR accuracies indicate that the insertion errors
increase while decoding slower speech with models trained on relatively faster speech.
We then propose a way to characterise speakers in acoustic space based on speaker
adaptation transforms and observe that speakers (especially males) can be segregated
with reasonable accuracies based on age. Inspired by this, we look at supervised hierarchical
acoustic models based on gender and age. Significant improvements in word
accuracies are achieved over the baseline results with such models. The idea is then extended
to construct unsupervised hierarchical models which also outperform the baseline
models by a good margin.
Finally, we hypothesize that the ASR accuracies can be improved by augmenting
the adaptation data with speech from acoustically closest speakers. A strategy to select
the augmentation speakers is proposed. Experimental results on two corpora indicate
that the hypothesis holds true only when the amount of available adaptation is limited
to a few seconds. The efficacy of such a speaker selection strategy is analysed for both
younger and older adults.
en
dc.identifier.uri
http://hdl.handle.net/1842/5725
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Ravichander Vipperla, Steve Renals, and Joe Frankel. Longitudinal study of ASR performance on ageing voices. In Proceedings of Interspeech, Brisbane, 2008.
en
dc.relation.hasversion
Ravichander Vipperla,MariaWolters, Kallirroi Georgila, and Steve Renals. Speech input from older users in smart environments: Challenges and perspectives. In Proc. HCI International: Universal Access in Human-Computer Interaction. Intelligent and Ubiquitous Interaction Environments, number 5615 in Lecture Notes in Computer Science. Springer, 2009.
en
dc.relation.hasversion
Maria Wolters, Ravichander Vipperla, and Steve Renals. Age Recognition for Spoken Dialogue Systems: Do We Need It? In Proceedings of Interspeech, Brighton, 2009.
en
dc.relation.hasversion
Ravichander Vipperla, Steve Renals, and Joe Frankel. Ageing voices: The effect of changes in voice parameters on ASR performance. EURASIP Journal on Audio, Speech and Music Processing, 2010.
en
dc.relation.hasversion
Ravichander Vipperla, Steve Renals, and Joe Frankel. Augmentation of adaptation data. Proceedings of Interspeech, Makuhari, 2010.
en
dc.subject
automatic speech recognition
en
dc.subject
ageing voices
en
dc.subject
voice analysis
en
dc.subject
speaker adaptation
en
dc.title
Automatic Speech Recognition for ageing voices
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
This item appears in the following Collection(s)

