Deep language models for software testing and optimisation
dc.contributor.advisor
Rajan, Ajitha
dc.contributor.advisor
O'Boyle, Michael
dc.contributor.advisor
Leather, Hugh
dc.contributor.author
Tsimpourlas, Foivos
dc.date.accessioned
2023-06-16T09:48:44Z
dc.date.available
2023-06-16T09:48:44Z
dc.date.issued
2023-06-16
dc.description.abstract
Developing software is difficult. A challenging part of production development is ensuring programs are correct and fast, two properties satisfied with software testing and
optimisation. While both tasks still rely on manual effort and expertise, the recent
surge in software applications has led them to become tedious and time-consuming.
Under this fast-pace environment, manual testing and optimisation hinders productivity significantly and leads to error-prone or sub-optimal programs that waste energy
and lead users to frustration. In this thesis, we propose three novel approaches to automate software testing and optimisation with modern language models based on deep
learning. In contrast to our methods, existing few techniques in these two domains
have limited scalability and struggle when they face real-world applications.
Our first contribution lies in the field of software testing and aims to automate
the test oracle problem, which is the procedure of determining the correctness of test
executions. The test oracle is still largely manual, relying on human experts. Automating the oracle is a non-trivial task that requires software specifications or derived
information that are often too difficult to extract. We present the first application of
deep language models over program execution traces to predict runtime correctness.
Our technique classifies test executions of large-scale codebases used in production as
“pass” or “fail”. Our proposed approach reduces by 86% the amount of test inputs an
expert has to label by training only on 14% and classifying the rest automatically.
Our next two contributions improve the effectiveness of compiler optimisation.
Compilers optimise programs by applying heuristic-based transformations constructed
by compiler engineers. Selecting the right transformations requires extensive knowledge of the compiler, the subject program and the target architecture. Predictive models
have been successfully used to automate heuristics construction but their performance
is hindered by a shortage of training benchmarks in quantity and feature diversity. Our
next contributions address the scarcity of compiler benchmarks by generating human-likely synthetic programs to improve the performance of predictive models.
Our second contribution is BENCHPRESS, the first steerable deep learning synthesizer for executable compiler benchmarks. BENCHPRESS produces human-like programs that compile at a rate of 87%. It targets parts of the feature space previously
unreachable by other synthesizers, addressing the scarcity of high-quality training data
for compilers. BENCHPRESS improves the performance of a device mapping predictive model by 50% when it introduces synthetic benchmarks into its training data. BENCHPRESS is restricted by a feature-agnostic synthesizer that requires thou sands of random inferences to select a few that target the desired features. Our third
contribution addresses this inefficiency. We develop BENCHDIRECT, a directed language model for compiler benchmark generation. BENCHDIRECT synthesizes programs by jointly observing the source code context and the compiler features that
are targeted. This enables efficient steerable generation on large scale tasks. Compared to BENCHPRESS, BENCHDIRECT matches successfully 1.8× more Rodinia target benchmarks, while it is up to 36% more accurate and up to 72% faster in targeting
three different feature spaces for compilers.
All three contributions demonstrate the exciting potential of deep learning and language models to simplify the testing of programs and the construction of better optimi sation heuristics for compilers. The outcomes of this thesis provides developers with
tools to keep up with the rapidly evolving landscape of software engineering.
en
dc.identifier.uri
https://hdl.handle.net/1842/40677
dc.identifier.uri
http://dx.doi.org/10.7488/era/3438
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
“Supervised learning over test executions as a test oracle”, F. Tsimpourlas, M. Allamanis, A. Rajan, SACSE 2021
en
dc.relation.hasversion
“Embedding and classifying test execution traces using neural networks”, F. Tsimpourlas, G. Rooijackers, A. Rajan, M. Allamanis, IET Software 2022
en
dc.relation.hasversion
“BenchPress: A Deep Active Benchmark Generator”, F. Tsimpourlas, P. Petoumenos, M. Xu, C. Cummins, K. Hazelwood, A. Rajan, H. Leather, PACT 2022
en
dc.relation.hasversion
TSIMPOURLAS, F., PAPADOPOULOS, L., BARTSOKAS, A., AND SOUDRIS, D. A design space exploration framework for convolutional neural networks implemented on edge devices. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2212–2221
en
dc.relation.hasversion
TSIMPOURLAS, F., PETOUMENOS, P., XU, M., CUMMINS, C., HAZELWOOD, K., RAJAN, A., AND LEATHER, H. Benchdirect: A directed language model for compiler benchmarks, 2023.
en
dc.subject
software production development
en
dc.subject
software testing and optimisation
en
dc.subject
test oracle
en
dc.subject
deep language model
en
dc.subject
compiler optimisation
en
dc.subject
BenchPress
en
dc.title
Deep language models for software testing and optimisation
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Tsimpourlas2023.pdf
- Size:
- 4.91 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

