Flexible neural architectures for sequence modeling

Krause, Benjamin

Flexible neural architectures for sequence modeling

Simple item page

dc.contributor.advisor

Renals, Stephen

en

dc.contributor.advisor

Murray, Iain

en

dc.contributor.author

Krause, Benjamin

en

dc.date.accessioned

2020-05-26T13:17:33Z

dc.date.available

2020-05-26T13:17:33Z

dc.date.issued

2020-06-25

dc.description.abstract

Auto-regressive sequence models can estimate the distribution of any type of sequential data. To study sequence models, we consider the problem of language modeling, which entails predicting probability distributions over sequences of text. This thesis improves on previous language modeling approaches by giving models additional flexibility to adapt to their inputs. In particular, we focus on multiplicative LSTM (mLSTM), which has added flexibility to change its recurrent transition function depending on its input as compared with traditional LSTM, and dynamic evaluation, which helps LSTM (or other sequence models) adapt to the recent sequence history to exploit re-occurring patterns within a sequence. We find that using these adaptive approaches for language modeling improves their predictions by helping them recover from surprising tokens and sequences. mLSTM is a hybrid of a multiplicative recurrent neural network (mRNN) and an LSTM. mLSTM is characterized by its ability to have recurrent transition functions that can vary more for each possible input token, and makes better predictions as compared with LSTM after viewing unexpected inputs in our experiments. mLSTM also outperformed all previous neural architectures at character level language modeling. Dynamic evaluation is a method for adapting sequence models to the recent sequence history at inference time using gradient descent, assigning higher probabilities to re-occurring sequential patterns. While dynamic evaluation was often previously viewed as a way of using additional training data, this thesis argues that dynamic evaluation is better thought of as a way of adapting probability distributions to their own predictions. We also explore and develop dynamic evaluation methods with the goals of achieving the best prediction performance and computational/memory efficiency, as well as understanding why these methods work. Different variants of dynamic evaluation are applied to a number of different architectures, resulting in improvements to language modeling over a longer contexts, as well as polyphonic music prediction. Dynamically evaluated models are also able to generate conditional samples that repeat patterns from the conditioning text, and achieve improved generalization in modeling out of domain sequences. The added flexibility that dynamic evaluation gives models allows them to recover faster when predicting unexpected sequences. The proposed approaches improve on previous language models by giving them additional flexibility to adapt to their inputs. mLSTM and dynamic evaluation both contributed to improvements to the state of the art in language modeling, and have potential applications to a wider range of sequence modeling problems.

en

dc.identifier.uri

https://hdl.handle.net/1842/37088

dc.identifier.uri

http://dx.doi.org/10.7488/era/389

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Krause, B. (2015). Optimizing and contrasting recurrent neural network architectures. Master’s thesis, The University of Edinburgh. https://arxiv.org/abs/1510. 04953.

en

dc.relation.hasversion

Krause, B., Damonte, M., Dobre, M., Duma, D., Fainberg, J., Fancellu, F., Kahembwe, E., Cheng, J., and Webber, B. (2017). Edina: Building an open domain socialbot with self-dialogues. Alexa Prize Proceedings.

en

dc.relation.hasversion

Krause, B., Kahembwe, E., Murray, I., and Renals, S. (2018). Dynamic evaluation of neural sequence models. ICML

en

dc.relation.hasversion

Krause, B., Kahembwe, E., Murray, I., and Renals, S. (2019). Dynamic evaluation of transformer language models. arXiv:1904.08378.

en

dc.relation.hasversion

Krause,B.,Lu,L.,Murray,I.,andRenals,S.(2016). MultiplicativeLSTMforsequence modelling. arXiv:1609.07959

en

dc.relation.hasversion

Krause, B., Murray, I., Renals, S., and Lu, L. (2017). Multiplicative LSTM for sequence modelling. ICLR Workshop track

en

dc.subject

language modeling

en

dc.subject

multiplicative LSTM

en

dc.subject

mLSTM

en

dc.subject

dynamic evaluation

en

dc.subject

sequence modeling

en

dc.title

Flexible neural architectures for sequence modeling

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Krause2020.pdf
Size:: 823.82 KB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection