Automatic Speech Recognition for ageing voices

Vipperla, Ravichander

Automatic Speech Recognition for ageing voices

Simple item page

dc.contributor.advisor

Renals, Steve

en

dc.contributor.author

Vipperla, Ravichander

en

dc.contributor.sponsor

Scottish Funding Council

en

dc.contributor.sponsor

University of Edinburgh

en

dc.date.accessioned

2012-01-18T09:56:31Z

dc.date.available

2012-01-18T09:56:31Z

dc.date.issued

2011-11-24

dc.description.abstract

With ageing, human voices undergo several changes which are typically characterised by increased hoarseness, breathiness, changes in articulatory patterns and slower speaking rate. The focus of this thesis is to understand the impact of ageing on Automatic Speech Recognition (ASR) performance and improve the ASR accuracies for older voices. Baseline results on three corpora indicate that the word error rates (WER) for older adults are significantly higher than those of younger adults and the decrease in accuracies is higher for males speakers as compared to females. Acoustic parameters such as jitter and shimmer that measure glottal source disfluencies were found to be significantly higher for older adults. However, the hypothesis that these changes explain the differences in WER for the two age groups is proven incorrect. Experiments with artificial introduction of glottal source disfluencies in speech from younger adults do not display a significant impact on WERs. Changes in fundamental frequency observed quite often in older voices has a marginal impact on ASR accuracies. Analysis of phoneme errors between younger and older speakers shows a pattern of certain phonemes especially lower vowels getting more affected with ageing. These changes however are seen to vary across speakers. Another factor that is strongly associated with ageing voices is a decrease in the rate of speech. Experiments to analyse the impact of slower speaking rate on ASR accuracies indicate that the insertion errors increase while decoding slower speech with models trained on relatively faster speech. We then propose a way to characterise speakers in acoustic space based on speaker adaptation transforms and observe that speakers (especially males) can be segregated with reasonable accuracies based on age. Inspired by this, we look at supervised hierarchical acoustic models based on gender and age. Significant improvements in word accuracies are achieved over the baseline results with such models. The idea is then extended to construct unsupervised hierarchical models which also outperform the baseline models by a good margin. Finally, we hypothesize that the ASR accuracies can be improved by augmenting the adaptation data with speech from acoustically closest speakers. A strategy to select the augmentation speakers is proposed. Experimental results on two corpora indicate that the hypothesis holds true only when the amount of available adaptation is limited to a few seconds. The efficacy of such a speaker selection strategy is analysed for both younger and older adults.

en

dc.identifier.uri

http://hdl.handle.net/1842/5725

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Ravichander Vipperla, Steve Renals, and Joe Frankel. Longitudinal study of ASR performance on ageing voices. In Proceedings of Interspeech, Brisbane, 2008.

en

dc.relation.hasversion

Ravichander Vipperla,MariaWolters, Kallirroi Georgila, and Steve Renals. Speech input from older users in smart environments: Challenges and perspectives. In Proc. HCI International: Universal Access in Human-Computer Interaction. Intelligent and Ubiquitous Interaction Environments, number 5615 in Lecture Notes in Computer Science. Springer, 2009.

en

dc.relation.hasversion

Maria Wolters, Ravichander Vipperla, and Steve Renals. Age Recognition for Spoken Dialogue Systems: Do We Need It? In Proceedings of Interspeech, Brighton, 2009.

en

dc.relation.hasversion

Ravichander Vipperla, Steve Renals, and Joe Frankel. Ageing voices: The effect of changes in voice parameters on ASR performance. EURASIP Journal on Audio, Speech and Music Processing, 2010.

en

dc.relation.hasversion

Ravichander Vipperla, Steve Renals, and Joe Frankel. Augmentation of adaptation data. Proceedings of Interspeech, Makuhari, 2010.

en

dc.subject

automatic speech recognition

en

dc.subject

ageing voices

en

dc.subject

voice analysis

en

dc.subject

speaker adaptation

en

dc.title

Automatic Speech Recognition for ageing voices

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 2 of 2

Name:: graphs.zip
Size:: 6.26 MB
Format:: Postscript Files

Download

Name:: Vipperla2011.pdf
Size:: 1.75 MB
Format:: Adobe Portable Document Format

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection