Show simple item record

dc.contributor.advisorRenals, Stephenen
dc.contributor.advisorMcInnes, Fergusen
dc.contributor.advisorYamagishi, Junichien
dc.contributor.authorGangireddy, Siva Reddyen
dc.date.accessioned2018-03-26T12:53:30Z
dc.date.available2018-03-26T12:53:30Z
dc.date.issued2017-07-07
dc.identifier.urihttp://hdl.handle.net/1842/28990
dc.description.abstractThe goal of this thesis is to advance the use of recurrent neural network language models (RNNLMs) for large vocabulary continuous speech recognition (LVCSR). RNNLMs are currently state-of-the-art and shown to consistently reduce the word error rates (WERs) of LVCSR tasks when compared to other language models. In this thesis we propose various advances to RNNLMs. The advances are: improved learning procedures for RNNLMs, enhancing the context, and adaptation of RNNLMs. We learned better parameters by a novel pre-training approach and enhanced the context using prosody and syntactic features. We present a pre-training method for RNNLMs, in which the output weights of a feed-forward neural network language model (NNLM) are shared with the RNNLM. This is accomplished by first fine-tuning the weights of the NNLM, which are then used to initialise the output weights of an RNNLM with the same number of hidden units. To investigate the effectiveness of the proposed pre-training method, we have carried out text-based experiments on the Penn Treebank Wall Street Journal data, and ASR experiments on the TED lectures data. Across the experiments, we observe small but significant improvements in perplexity (PPL) and ASR WER. Next, we present unsupervised adaptation of RNNLMs. We adapted the RNNLMs to a target domain (topic or genre or television programme (show)) at test time using ASR transcripts from first pass recognition. We investigated two approaches to adapt the RNNLMs. In the first approach the forward propagating hidden activations are scaled - learning hidden unit contributions (LHUC). In the second approach we adapt all parameters of RNNLM.We evaluated the adapted RNNLMs by showing the WERs on multi genre broadcast speech data. We observe small (on an average 0.1% absolute) but significant improvements in WER compared to a strong unadapted RNNLM model. Finally, we present the context-enhancement of RNNLMs using prosody and syntactic features. The prosody features were computed from the acoustics of the context words and the syntactic features were from the surface form of the words in the context. We trained the RNNLMs with word duration, pause duration, final phone duration, syllable duration, syllable F0, part-of-speech tag and Combinatory Categorial Grammar (CCG) supertag features. The proposed context-enhanced RNNLMs were evaluated by reporting PPL and WER on two speech recognition tasks, Switchboard and TED lectures. We observed substantial improvements in PPL (5% to 15% relative) and small but significant improvements in WER (0.1% to 0.5% absolute).en
dc.contributor.sponsorEngineering and Physical Sciences Research Council (EPSRC)en
dc.language.isoen
dc.publisherThe University of Edinburghen
dc.relation.hasversionPeter Bell, Fergus McInnes, SivaReddy Gangireddy, Mark Sinclair, Alexandria Birch, and Steve Renals. The UEDIN English ASR system for the IWSLT 2013 evaluation. In Proc. International Workshop on Spoken Language Translation, Heidelberg, Germany, Dec 2013en
dc.relation.hasversionSiva Reddy Gangireddy, Fergus McInnes, and Steve Renals. Feed forward pre-training for recurrent neural network language models. In Proc. Interspeech, pages 2620– 2624, Singapore, September 2014.en
dc.relation.hasversionSiva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, and Akinobu Lee. Prosodically-enhanced recurrent neural network language models. In INTERSPEECH, pages 2390–2394, Dresden, Germany, September 2015.en
dc.relation.hasversionSiva Reddy Gangireddy, Pawel Swietojanski, Peter Bell, and Steve Renals. Unsupervised adaptation of recurrent neural netowrk language models. In INTERSPEECH, pages 2333–2337, San Francisco, USA, September 2016.en
dc.subjectRNNLMen
dc.subjectNNLMen
dc.subjectN-gramsen
dc.subjectlanguage modellingen
dc.subjectautomatic speech recognitionen
dc.subjectpre-trainingen
dc.subjectcontext-enhancementen
dc.subjectadaptationen
dc.subjectTED Talksen
dc.subjectswitchboarden
dc.subjectMGB Challengeen
dc.subjectprosody featuresen
dc.subjectsyntactic featuresen
dc.subjectPOS, CCG Supertagsen
dc.subjectLHUCen
dc.titleRecurrent neural network language models for automatic speech recognitionen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record