Understanding Uncertainty in Static Pointer Analysis
View/ Open
Date
2009Author
Ribeiro, Constantino Goncalves
Metadata
Abstract
For programs that make extensive use of pointers, pointer analysis is often critical
for the effectiveness of optimising compilers and tools for reasoning about program
behaviour and correctness. Static pointer analysis has been extensively studied and
several algorithms have been proposed, but these only provide approximate solutions.
As such inaccuracy may hinder further optimisations, it is important to understand
how short these algorithms come of providing accurate information about the points-to
relations.
This thesis attempts to quantify the amount of uncertainty of the points-to relations
that remains after a state-of-the-art context- and flow-sensitive pointer analysis algorithm
is applied to a collection of programs from two well-known benchmark suites:
SPEC integer and MediaBench. This remaining static uncertainty is then compared
to the run-time behaviour. Unlike previous work that compared run-time behaviour
against less accurate context- and flow-insensitive algorithms, the goal of this work is
to quantify the amount of uncertainty that is intrinsic to the applications and that defeat
even the most accurate static analyses.
In a first step to quantify the uncertainties, a compiler framework was proposed and
implemented. It is based on the SUIF1 research compiler framework and the SPAN
pointer analysis package. This framework was then used to collect extensive data
from the static points-to analysis. It was also used to drive a profiled execution of the
programs in order to collect the real run-time points-to data. Finally, the static and the
run-time data were compared.
Experimental results show that often the static pointer analysis is very accurate, but
for some benchmarks a significant fraction, up to 25%, of their accesses via pointer dereferences
cannot be statically fully disambiguated. We find that some 27% of these
de-references turn out to access a single memory location at run time, but many do
access several different memory locations. We find that the main reasons for this are
the use of pointer arithmetic and the fact that some control paths are not taken. The
latter is an example of a source of uncertainty that is intrinsic to the application.