Edinburgh Research Archive logo

Edinburgh Research Archive

University of Edinburgh homecrest
View Item 
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

Efficient neural networks

View/Open
TurnerJ_2022.pdf (4.372Mb)
Date
23/08/2022
Author
Turner, Jack
Metadata
Show full item record
Abstract
Improving the e ciency of neural networks has great potential impact due to their wide range of possible use cases and their high levels of arithmetic intensity. As neural network designs evolve and hardware grows more complex, the goal of modern deep learning compilers will be to exploit opportunities for optimisation at all levels of the deployment stack; from high level choices about neural architectures all the way down to low level decisions on code generation. This thesis decomposes neural network designs into three core components: skeletons, blocks, and operations. Each component is addressed individually, and the interactions between optimisations applied at di erent layers of the deployment stack are examined. First considered are the optimisation schemes for neural network skeletons, and it is shown that the commonplace prune-and- netune pattern has a negative impact on throughput on both CPUs and GPUs. New schemes are developed for downscaling skeletons that preserve hardware performance, yield better accuracy, and avoid the expensive netuning stage. Secondly, this thesis considers optimisation for neural network blocks. A wealth of research has been dedicated to designing drop-in replacements for neural network blocks that attempt to improve their e ciency. Based on a set of simple drop-ins, this thesis develops new method for quickly deciding which replacements to put where in a network. It is shown that the algorithm developed can be used more generally to design such blocks from scratch. A core facet of the algorithm is a rejection lter which can be used to guide the kinds of networks proposed. This rejection lter can take the form of simple parameter counts, or more complex compilation metrics such as optimised inference time or levels of data reuse. This provides a potential handle for interaction between the network designer and the optimising compiler. Finally, the thesis considers network operations. Ideas are uni ed from optimising compilers and network architecture search into a single framework that allows for the generation new operations, and mutations of network architectures into highly optimised forms.
URI
https://hdl.handle.net/1842/39326

http://dx.doi.org/10.7488/era/2577
Collections
  • Informatics thesis and dissertation collection

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page

 

 

All of ERACommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisors
LoginRegister

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page