Reducing the cost of heuristic generation with machine learning
Item Status
Embargo End Date
Date
Authors
Abstract
The space of compile-time transformations and or run-time options which can improve
the performance of a given code is usually so large as to be virtually impossible
to search in any practical time-frame. Thus, heuristics are leveraged which can suggest
good but not necessarily best configurations. Unfortunately, since such heuristics are
tightly coupled to processor architecture performance is not portable; heuristics must
be tuned, traditionally manually, for each device in turn. This is extremely laborious
and the result is often outdated heuristics and less effective optimisation.
Ideally, to keep up with changes in hardware and run-time environments a fast and
automated method to generate heuristics is needed. Recent works have shown that
machine learning can be used to produce mathematical models or rules in their place,
which is automated but not necessarily fast. This thesis proposes the use of active
machine learning, sequential analysis, and active feature acquisition to accelerate the
training process in an automatic way, thereby tackling this timely and substantive issue.
First, a demonstration of the efficiency of active learning over the previously standard
supervised machine learning technique is presented in the form of an ensemble
algorithm. This algorithm learns a model capable of predicting the best processing
device in a heterogeneous system to use per workload size, per kernel. Active machine
learning is a methodology which is sensitive to the cost of training; specifically, it is
able to reduce the time taken to construct a model by predicting how much is expected
to be learnt from each new training instance and then only choosing to learn from those
most profitable examples. The exemplar heuristic is constructed on average 4x faster
than a baseline approach, whilst maintaining comparable quality.
Next, a combination of active learning and sequential analysis is presented which
reduces both the number of samples per training example as well as the number of
training examples overall. This allows for the creation of models based on noisy information,
sacrificing accuracy per training instance for speed, without having a significant
affect on the quality of the final product. In particular, the runtime of high-performance
compute kernels is predicted from code transformations one may want to
apply using a heuristic which was generated up to 26x faster than with active learning
alone.
Finally, preliminary work demonstrates that an automated system can be created
which optimises both the number of training examples as well as which features to
select during training to further substantially accelerate learning, in cases where each
feature value that is revealed comes at some cost.
This item appears in the following Collection(s)

