Efficient methods and architectures for deep neural network sequence models

Mbabazi, Emmanuel Kahembwe

Efficient methods and architectures for deep neural network sequence models

Simple item page

dc.contributor.advisor

Ramamoorthy, Subramanian

dc.contributor.advisor

Li, Zhibin

dc.contributor.author

Mbabazi, Emmanuel Kahembwe

dc.contributor.sponsor

Engineering and Physical Sciences Research Council (EPSRC)

en

dc.date.accessioned

2022-01-20T11:39:55Z

dc.date.available

2022-01-20T11:39:55Z

dc.date.issued

2021-11-30

dc.description.abstract

The recent resurgence of neural networks, termed "Deep Learning", has led to a reinvigoration of the artificial intelligence research field and all related sub-fields; from robotics and vision to natural language processing and understanding. In the last decade, this field has seen incredible breakthroughs, primarily driven by improvements to computing capability that have allowed for ever larger neural network architectures. The key driving force behind this resurgence has been the graphics processing unit (GPU) and as deep neural networks (DNNs) get ever larger, efficiency has become a bottleneck issue. Even with ample amounts of GPUs and significant financial resources, the state-of-the-art neural network models and methods are out of reach for most scientists. The significance of this challenge is brought to bare when attempting to use DNNs on video, the most consumed form of data and media. Modelling high dimensional data such as video is already computationally expensive and challenging even with small neural networks. With the 2020 Coronavirus pandemic, production and consumption of video has greatly increased as the global business population moves to working and interacting online. The low cost of video production and transmission is quickly making it the most common medium of digital communication for socially distanced humans. Video is also often the cheapest and most detailed source of information relied upon in fields such as robotics; for driverless cars, drones and teleoperated machines. As such, being able to efficiently model such data is of paramount importance to the field of AI. In this thesis, we tackle the issue of efficient modelling of complex high dimensional sequential data such as video and language. We address this problem on two fronts, computational efficiency and algorithmic efficiency. On the computational front, we propose a design methodology that significantly lowers the cost of video modelling tasks while improving performance. To enable this, we bring to bare the tools of hessian analysis in the most comprehensive analysis of generative video models to date. We then go on to tackle sequential modelling from an algorithmic efficiency perspective. We propose methods that use the temporal dynamics of sequential data to improve modelling performance post-training. We highlight the new capabilities enabled when optimization is not restricted to training scenarios and conjecture that intelligent systems should never stop training. In a collaborative effort, we propose similar approaches for natural language modelling. To conclude, we demonstrate with a single commodity GPU, that our proposed methods and architectures realise state-of-the-art results often surpassing the performance of models trained on hundreds of GPUs at significant financial cost.

en

dc.identifier.uri

https://hdl.handle.net/1842/38446

dc.identifier.uri

http://dx.doi.org/10.7488/era/1710

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.subject

deep learning

en

dc.subject

video analysis

en

dc.subject

sequence modeling

en

dc.subject

natural language processing

en

dc.subject

neural networks

en

dc.title

Efficient methods and architectures for deep neural network sequence models

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Mbabazi2021.pdf
Size:: 2.65 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection