Show simple item record

dc.contributor.advisorRenals, Stephen
dc.contributor.advisorGhoshal, Arnab
dc.contributor.authorLu, Liang
dc.date.accessioned2013-11-07T16:04:14Z
dc.date.available2013-11-07T16:04:14Z
dc.date.issued2013-11-28
dc.identifier.urihttp://hdl.handle.net/1842/8065
dc.description.abstractIn most of state-of-the-art speech recognition systems, Gaussian mixture models (GMMs) are used to model the density of the emitting states in the hidden Markov models (HMMs). In a conventional system, the model parameters of each GMM are estimated directly and independently given the alignment. This results a large number of model parameters to be estimated, and consequently, a large amount of training data is required to fit the model. In addition, different sources of acoustic variability that impact the accuracy of a recogniser such as pronunciation variation, accent, speaker factor and environmental noise are only weakly modelled and factorized by adaptation techniques such as maximum likelihood linear regression (MLLR), maximum a posteriori adaptation (MAP) and vocal tract length normalisation (VTLN). In this thesis, we will discuss an alternative acoustic modelling approach — the subspace Gaussian mixture model (SGMM), which is expected to deal with these two issues better. In an SGMM, the model parameters are derived from low-dimensional model and speaker subspaces that can capture phonetic and speaker correlations. Given these subspaces, only a small number of state-dependent parameters are required to derive the corresponding GMMs. Hence, the total number of model parameters can be reduced, which allows acoustic modelling with a limited amount of training data. In addition, the SGMM-based acoustic model factorizes the phonetic and speaker factors and within this framework, other source of acoustic variability may also be explored. In this thesis, we propose a regularised model estimation for SGMMs, which avoids overtraining in case that the training data is sparse. We will also take advantage of the structure of SGMMs to explore cross-lingual acoustic modelling for low-resource speech recognition. Here, the model subspace is estimated from out-domain data and ported to the target language system. In this case, only the state-dependent parameters need to be estimated which relaxes the requirement of the amount of training data. To improve the robustness of SGMMs against environmental noise, we propose to apply the joint uncertainty decoding (JUD) technique that is shown to be efficient and effective. We will report experimental results on the Wall Street Journal (WSJ) database and GlobalPhone corpora to evaluate the regularisation and cross-lingual modelling of SGMMs. Noise compensation using JUD for SGMM acoustic models is evaluated on the Aurora 4 database.en_US
dc.language.isoenen_US
dc.publisherThe University of Edinburghen_US
dc.relation.hasversionLu, L., Chin, K., Ghoshal, A., and Renals, S. (2012). Noise compensation for subspace Gaussian mixture models. In Proc. INTERSPEECH.en_US
dc.relation.hasversionLu, L., Chin, K., Ghoshal, A., and Renals, S. (2013). Joint uncertainty decoding for noise robust subspace Gaussian mixture models. IEEE Transactions on Audio, Speech, and Language Processing.en_US
dc.relation.hasversionLu, L., Ghoshal, A., and Renals, S. (2011). Regularized subspace Gaussian mixture models for cross-lingual speech recognition. In Proc. IEEE ASRU.en_US
dc.relation.hasversionLu, L., Ghoshal, A., and Renals, S. (2011). Regularized subspace Gaussian mixture models for speech recognition. IEEE Signal Processing Letters, 18(7):419–422.en_US
dc.relation.hasversionLu, L., Ghoshal, A., and Renals, S. (2012). Joint uncertainty decoding with unscented transforms for noise robust subspace Gaussian mixture models. In Proc. SAPASCALE Workshop.en_US
dc.relation.hasversionLu, L., Ghoshal, A., and Renals, S. (2012). Maximum a posteriori adaptation of subspace Gaussian mixture models for cross-lingual speech recognition. In Proc. ICASSP.en_US
dc.relation.hasversionLu, L., Ghoshal, A., and Renals, S. (2013). Noise adaptive training for subspace Gaussian mixture models. In Proc. INTERSPEECH.en_US
dc.subjectsubspace modelen_US
dc.subjectspeech recognitionen_US
dc.subjectnoiseen_US
dc.subjectmultilingualen_US
dc.titleSubspace Gaussian mixture models for automatic speech recognitionen_US
dc.typeThesis or Dissertationen_US
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhD Doctor of Philosophyen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record