dc.contributor.advisor | Renals, Stephen | |
dc.contributor.advisor | Bell, Peter | |
dc.contributor.author | Klejch, Ondrej | |
dc.date.accessioned | 2021-09-17T11:01:56Z | |
dc.date.available | 2021-09-17T11:01:56Z | |
dc.date.issued | 2020-11-30 | |
dc.identifier.uri | https://hdl.handle.net/1842/38068 | |
dc.identifier.uri | http://dx.doi.org/10.7488/era/1339 | |
dc.description.abstract | The performance of automatic speech recognition systems degrades rapidly when there
is a mismatch between training and testing conditions. One way to compensate for this
mismatch is to adapt an acoustic model to test conditions, for example by performing
speaker adaptation. In this thesis we focus on the discriminative model-based speaker
adaptation approach. The success of this approach relies on having a robust speaker
adaptation procedure – we need to specify which parameters should be adapted and
how they should be adapted. Unfortunately, tuning the speaker adaptation procedure
requires considerable manual effort.
In this thesis we propose to formulate speaker adaptation as a meta-learning task. In
meta-learning, learning occurs on two levels: a learner learns a task specific model and
a meta-learner learns how to train these task specific models. In our case, the learner is
a speaker dependent-model and the meta-learner learns to adapt a speaker-independent
model into the speaker dependent model. By using this formulation, we can automatically learn robust speaker adaptation procedures using gradient descent. In the exper iments, we demonstrate that the meta-learning approach learns competitive adaptation
schedules compared to adaptation procedures with handcrafted hyperparameters.
Subsequently, we show that speaker adaptive training can be formulated as a meta-learning task as well. In contrast to the traditional approach, which maintains and optimises a copy of speaker dependent parameters for each speaker during training, we
embed the gradient based adaptation directly into the training of the acoustic model.
We hypothesise that this formulation should steer the training of the acoustic model
into finding parameters better suited for test-time speaker adaptation. We experimentally compare our approach with test-only adaptation of a standard baseline model and
with SAT-LHUC, which represents a traditional speaker adaptive training method. We
show that the meta-learning speaker-adaptive training approach achieves comparable
results with SAT-LHUC. However, neither the meta-learning approach nor SAT-LHUC
outperforms the baseline approach after adaptation.
Consequently, we run a series of experimental ablations to determine why SAT-LHUC does not yield any improvements compared to the baseline approach. In these
experiments we explored multiple factors such as using various neural network architectures, normalisation techniques, activation functions or optimisers. We find that
SAT-LHUC interferes with batch normalisation, and that it benefits from an increased
hidden layer width and an increased model size. However, the baseline model benefits from increased capacity too, therefore in order to obtain the best model it is still
favourable to train a speaker independent model with batch normalisation. As such, an
effective way of training state-of-the-art SAT-LHUC models remains an open question.
Finally, we show that the performance of unsupervised speaker adaptation can be
further improved by using discriminative adaptation with lattices as supervision obtained from a first pass decoding, instead of traditionally used one-best path tran scriptions. We find that this proposed approach enables many more parameters to
be adapted without overfitting being observed, and is successful even when the initial
transcription has a WER in excess of 50%. | en |
dc.contributor.sponsor | European Commission | en |
dc.language.iso | en | en |
dc.publisher | The University of Edinburgh | en |
dc.relation.hasversion | Fainberg, J., Klejch, O., Loweimi, E., Bell, P., and Renals, S. (2019). Acoustic model adaptation from raw waveforms with SincNet. In ASRU | en |
dc.relation.hasversion | Fainberg, J., Klejch, O., Renals, S., and Bell, P. (2019). Lattice-based lightly-supervised acoustic model training. In Interspeech | en |
dc.relation.hasversion | Klejch, O., Bell, P., and Renals, S. (2016). Punctuated transcription of multi-genre broadcasts using acoustic and lexical approaches. In SLT | en |
dc.relation.hasversion | Klejch, O., Bell, P., and Renals, S. (2017). Sequence-to-sequence models for punctu ated transcription combining lexical and acoustic features. In ICASSP. | en |
dc.relation.hasversion | Klejch, O., Fainberg, J., and Bell, P. (2018). Learning to adapt: a meta-learning ap proach for speaker adaptation. In Interspeech. | en |
dc.relation.hasversion | Klejch, O., Fainberg, J., Bell, P., and Renals, S. (2019). Lattice-based unsupervised test-time adaptation of neural network acoustic models. arXiv preprint arXiv:1906.11521. | en |
dc.relation.hasversion | Klejch, O., Fainberg, J., Bell, P., and Renals, S. (2019). Speaker adaptive training using model agnostic meta-learning. In ASRU | en |
dc.relation.hasversion | Liepins, R., Germann, U., Barzdins, G., Birch, A., Renals, S., Weber, S., van der Kreeft, P., Bourlard, H., Prieto, J., Klejch, O., et al. (2017). The SUMMA platform prototype. In Software Demonstrations ACL. | en |
dc.relation.hasversion | Roth, J., Chaudhuri, S., Klejch, O., Marvin, R., Gallagher, A., et al. (2019). AVA ActiveSpeaker: An audio-visual dataset for active speaker detection. arXiv preprint arXiv:1901.01342 | en |
dc.relation.hasversion | Tsunoo, E., Klejch, O., Bell, P., and Renals, S. (2017). Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features. In ASRU | en |
dc.subject | automatic speech recognition | en |
dc.subject | speaker adaptation | en |
dc.subject | meta-learning | en |
dc.title | Learning to adapt: meta-learning approaches for speaker adaptation | en |
dc.type | Thesis or Dissertation | en |
dc.type.qualificationlevel | Doctoral | en |
dc.type.qualificationname | PhD Doctor of Philosophy | en |