Cross-lingual automatic speech recognition using tandem features
dc.contributor.advisor
King, Simon
en
dc.contributor.advisor
Renals, Steve
en
dc.contributor.author
Lal, Partha
en
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2012-01-19T15:11:21Z
dc.date.available
2012-01-19T15:11:21Z
dc.date.issued
2011-11-24
dc.description.abstract
Automatic speech recognition requires many hours of transcribed speech recordings
in order for an acoustic model to be effectively trained. However, recording speech
corpora is time-consuming and expensive, so such quantities of data exist only for
a handful of languages — there are many languages for which little or no data exist.
Given that there are acoustic similarities between different languages, it may be fruitful
to use data from a well-supported source language for the task of training a recogniser
in a target language with little training data.
Since most languages do not share a common phonetic inventory, we propose an
indirect way of transferring information from a source language model to a target language
model. Tandem features, in which class-posteriors from a separate classifier
are decorrelated and appended to conventional acoustic features, are used to do that.
They have the advantage that the language used to train the classifier, typically a Multilayer
Perceptron (MLP) need not be the same as the target language being recognised.
Consistent with prior work, positive results are achieved for monolingual systems in a
number of different languages.
Furthermore, improvements are also shown for the cross-lingual case, in which the
tandem features were generated using a classifier not trained for the target language.
We examine factors which may predict the relative improvements brought about by
tandem features for a given source and target pair. We examine some cross-corpus
normalization issues that naturally arise in multilingual speech recognition and validate
our solution in terms of recognition accuracy and a mutual information measure.
The tandem classifier in work up to this point in the thesis has been a phoneme classifier.
Articulatory features (AFs), represented here as a multi-stream, discrete, multivalued
labelling of speech, can be used as an alternative task. The motivation for this is
that since AFs are a set of physically grounded categories that are not language-specific
they may be more suitable for cross-lingual transfer. Then, using either phoneme or
AF classification as our MLP task, we look at training the MLP using data from more
than one language — again we hypothesise that AF tandem will resulting greater improvements
in accuracy. We also examine performance where only limited amounts of
target language data are available, and see how our various tandem systems perform
under those conditions.
en
dc.identifier.uri
http://hdl.handle.net/1842/5773
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Livescu, K., Ozgur Cetin, Hasegawa-Johnson, M., King, S., Bartels, C., Borges, N., Kantor, A., Lal, P., Yung, L., Bezman, A., Dawson- Haggerty, S., Woods, B., Frankel, J., Magimai-Doss, M., and Saenko, K. (2007). Articulatory Feature-based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer Workshop. In IEEE Transactions on Acoustics, Speech, and Signal Processing, Honolulu, USA.
en
dc.subject
automatic speech recognition
en
dc.subject
tandem features
en
dc.subject
Articulatory features
en
dc.title
Cross-lingual automatic speech recognition using tandem features
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
This item appears in the following Collection(s)

