Cross-lingual automatic speech recognition using tandem features

Lal, Partha

Cross-lingual automatic speech recognition using tandem features

Simple item page

dc.contributor.advisor

King, Simon

en

dc.contributor.advisor

Renals, Steve

en

dc.contributor.author

Lal, Partha

en

dc.contributor.sponsor

Engineering and Physical Sciences Research Council (EPSRC)

en

dc.date.accessioned

2012-01-19T15:11:21Z

dc.date.available

2012-01-19T15:11:21Z

dc.date.issued

2011-11-24

dc.description.abstract

Automatic speech recognition requires many hours of transcribed speech recordings in order for an acoustic model to be effectively trained. However, recording speech corpora is time-consuming and expensive, so such quantities of data exist only for a handful of languages — there are many languages for which little or no data exist. Given that there are acoustic similarities between different languages, it may be fruitful to use data from a well-supported source language for the task of training a recogniser in a target language with little training data. Since most languages do not share a common phonetic inventory, we propose an indirect way of transferring information from a source language model to a target language model. Tandem features, in which class-posteriors from a separate classifier are decorrelated and appended to conventional acoustic features, are used to do that. They have the advantage that the language used to train the classifier, typically a Multilayer Perceptron (MLP) need not be the same as the target language being recognised. Consistent with prior work, positive results are achieved for monolingual systems in a number of different languages. Furthermore, improvements are also shown for the cross-lingual case, in which the tandem features were generated using a classifier not trained for the target language. We examine factors which may predict the relative improvements brought about by tandem features for a given source and target pair. We examine some cross-corpus normalization issues that naturally arise in multilingual speech recognition and validate our solution in terms of recognition accuracy and a mutual information measure. The tandem classifier in work up to this point in the thesis has been a phoneme classifier. Articulatory features (AFs), represented here as a multi-stream, discrete, multivalued labelling of speech, can be used as an alternative task. The motivation for this is that since AFs are a set of physically grounded categories that are not language-specific they may be more suitable for cross-lingual transfer. Then, using either phoneme or AF classification as our MLP task, we look at training the MLP using data from more than one language — again we hypothesise that AF tandem will resulting greater improvements in accuracy. We also examine performance where only limited amounts of target language data are available, and see how our various tandem systems perform under those conditions.

en

dc.identifier.uri

http://hdl.handle.net/1842/5773

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Livescu, K., Ozgur Cetin, Hasegawa-Johnson, M., King, S., Bartels, C., Borges, N., Kantor, A., Lal, P., Yung, L., Bezman, A., Dawson- Haggerty, S., Woods, B., Frankel, J., Magimai-Doss, M., and Saenko, K. (2007). Articulatory Feature-based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer Workshop. In IEEE Transactions on Acoustics, Speech, and Signal Processing, Honolulu, USA.

en

dc.subject

automatic speech recognition

en

dc.subject

tandem features

en

dc.subject

Articulatory features

en

dc.title

Cross-lingual automatic speech recognition using tandem features

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 3 of 3

Name:: jpeg files.zip
Size:: 733.42 KB
Format:: Joint Photographic Experts Group/JPEG File Interchange Format (JFIF)