Flexible neural architectures for sequence modeling
Auto-regressive sequence models can estimate the distribution of any type of sequential data. To study sequence models, we consider the problem of language modeling, which entails predicting probability distributions over sequences of text. This thesis improves on previous language modeling approaches by giving models additional flexibility to adapt to their inputs. In particular, we focus on multiplicative LSTM (mLSTM), which has added flexibility to change its recurrent transition function depending on its input as compared with traditional LSTM, and dynamic evaluation, which helps LSTM (or other sequence models) adapt to the recent sequence history to exploit re-occurring patterns within a sequence. We find that using these adaptive approaches for language modeling improves their predictions by helping them recover from surprising tokens and sequences. mLSTM is a hybrid of a multiplicative recurrent neural network (mRNN) and an LSTM. mLSTM is characterized by its ability to have recurrent transition functions that can vary more for each possible input token, and makes better predictions as compared with LSTM after viewing unexpected inputs in our experiments. mLSTM also outperformed all previous neural architectures at character level language modeling. Dynamic evaluation is a method for adapting sequence models to the recent sequence history at inference time using gradient descent, assigning higher probabilities to re-occurring sequential patterns. While dynamic evaluation was often previously viewed as a way of using additional training data, this thesis argues that dynamic evaluation is better thought of as a way of adapting probability distributions to their own predictions. We also explore and develop dynamic evaluation methods with the goals of achieving the best prediction performance and computational/memory efficiency, as well as understanding why these methods work. Different variants of dynamic evaluation are applied to a number of different architectures, resulting in improvements to language modeling over a longer contexts, as well as polyphonic music prediction. Dynamically evaluated models are also able to generate conditional samples that repeat patterns from the conditioning text, and achieve improved generalization in modeling out of domain sequences. The added flexibility that dynamic evaluation gives models allows them to recover faster when predicting unexpected sequences. The proposed approaches improve on previous language models by giving them additional flexibility to adapt to their inputs. mLSTM and dynamic evaluation both contributed to improvements to the state of the art in language modeling, and have potential applications to a wider range of sequence modeling problems.