Flexible neural architectures for sequence modeling
dc.contributor.advisor
Renals, Stephen
en
dc.contributor.advisor
Murray, Iain
en
dc.contributor.author
Krause, Benjamin
en
dc.date.accessioned
2020-05-26T13:17:33Z
dc.date.available
2020-05-26T13:17:33Z
dc.date.issued
2020-06-25
dc.description.abstract
Auto-regressive sequence models can estimate the distribution of any type of sequential data. To study sequence models, we consider the problem of language modeling, which entails predicting probability distributions over sequences of text. This thesis improves on previous language modeling approaches by giving models additional flexibility to adapt to their inputs. In particular, we focus on multiplicative LSTM (mLSTM), which has added flexibility to change its recurrent transition function depending on its input as compared with traditional LSTM, and dynamic evaluation, which helps LSTM (or other sequence models) adapt to the recent sequence history to exploit re-occurring patterns within a sequence. We find that using these adaptive approaches for language modeling improves their predictions by helping them recover from surprising tokens and sequences.
mLSTM is a hybrid of a multiplicative recurrent neural network (mRNN) and an LSTM. mLSTM is characterized by its ability to have recurrent transition functions that can vary more for each possible input token, and makes better predictions as compared with LSTM after viewing unexpected inputs in our experiments. mLSTM also outperformed all previous neural architectures at character level language modeling.
Dynamic evaluation is a method for adapting sequence models to the recent sequence history at inference time using gradient descent, assigning higher probabilities to re-occurring sequential patterns. While dynamic evaluation was often previously viewed as a way of using additional training data, this thesis argues that dynamic evaluation is better thought of as a way of adapting probability distributions to their own predictions. We also explore and develop dynamic evaluation methods with the goals of achieving the best prediction performance and computational/memory efficiency, as well as understanding why these methods work. Different variants of dynamic evaluation are applied to a number of different architectures, resulting in improvements to language modeling over a longer contexts, as well as polyphonic music prediction. Dynamically evaluated models are also able to generate conditional samples that repeat patterns from the conditioning text, and achieve improved generalization in modeling out of domain sequences. The added flexibility that dynamic evaluation gives models allows them to recover faster when predicting unexpected sequences.
The proposed approaches improve on previous language models by giving them additional flexibility to adapt to their inputs. mLSTM and dynamic evaluation both contributed to improvements to the state of the art in language modeling, and have potential applications to a wider range of sequence modeling problems.
en
dc.identifier.uri
https://hdl.handle.net/1842/37088
dc.identifier.uri
http://dx.doi.org/10.7488/era/389
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Krause, B. (2015). Optimizing and contrasting recurrent neural network architectures. Master’s thesis, The University of Edinburgh. https://arxiv.org/abs/1510. 04953.
en
dc.relation.hasversion
Krause, B., Damonte, M., Dobre, M., Duma, D., Fainberg, J., Fancellu, F., Kahembwe, E., Cheng, J., and Webber, B. (2017). Edina: Building an open domain socialbot with self-dialogues. Alexa Prize Proceedings.
en
dc.relation.hasversion
Krause, B., Kahembwe, E., Murray, I., and Renals, S. (2018). Dynamic evaluation of neural sequence models. ICML
en
dc.relation.hasversion
Krause, B., Kahembwe, E., Murray, I., and Renals, S. (2019). Dynamic evaluation of transformer language models. arXiv:1904.08378.
en
dc.relation.hasversion
Krause,B.,Lu,L.,Murray,I.,andRenals,S.(2016). MultiplicativeLSTMforsequence modelling. arXiv:1609.07959
en
dc.relation.hasversion
Krause, B., Murray, I., Renals, S., and Lu, L. (2017). Multiplicative LSTM for sequence modelling. ICLR Workshop track
en
dc.subject
language modeling
en
dc.subject
multiplicative LSTM
en
dc.subject
mLSTM
en
dc.subject
dynamic evaluation
en
dc.subject
sequence modeling
en
dc.title
Flexible neural architectures for sequence modeling
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Krause2020.pdf
- Size:
- 823.82 KB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

