Inlining in the Glasgow Haskell Compiler: empirical investigation and improvement
Item Status
Embargo End Date
Date
Authors
Hollenbeck, Celeste
Abstract
Inlining has been claimed to be the single most important optimization in Haskell, and also the
optimization receiving the most attention. It is also considered one of the hardest optimizations
to get right. If done poorly, it may cause code bloat, cache misses, and register spilling; yet
if done well, it can provide orders of magnitude better performance in terms of run time and
code size. Despite its importance, the Glasgow Haskell Compiler’s inliner has not undergone
significant change in decades.
This thesis explores the decision space of GHC’s inliner; presents potential approaches to
re-imagining the inliner, along with data and discussion for why they do not work; and additionally
provides a conceptually simple analysis which guides inlining decisions at the function
declaration site by advising the placement of compiler pragmas in source code.
The decision space exploration conducts an empirical investigation of the search space of
GHC’s inlining decisions as it pertains to the randomization of the magic numbers—or handcoded
constant integers—in the inlining heuristic.We demonstrate that randomizing the magic
numbers produces some speedup over default GHC about half of the time, showing ample
room for improvement. Then using iterative search, we produce four configurations into which
we can cluster Haskell packages to produce a 26% mean speedup over 10 benchmark
packages.
Following search space exploration, we investigate three machine learning models to predict
inlining decisions: a genetic algorithm and neural networks from within the middle of the compiler,
and graph neural networks to place pragmas at the source-code level. Our investigation
reveals that although we can train the models with a high degree of accuracy, their predictions
still fail to produce significant performance results when used in practice.
Finally, we devise a technique to place pragmas along hot call-chains—or complete, overapproximated
chains of functions connected to profiled hot spots—which yields a 10% runtime
speedup when combined with existing developer-placed pragmas, and 9% speedup
without existing pragmas. This approach uses packages’ abstract syntax trees to approximate
control flow in quadratic time, whereas a conventional context-insensitive flow analysis would
be cubic.
This item appears in the following Collection(s)

