Understanding Uncertainty in Static Pointer Analysis
Ribeiro, Constantino Goncalves
For programs that make extensive use of pointers, pointer analysis is often critical for the effectiveness of optimising compilers and tools for reasoning about program behaviour and correctness. Static pointer analysis has been extensively studied and several algorithms have been proposed, but these only provide approximate solutions. As such inaccuracy may hinder further optimisations, it is important to understand how short these algorithms come of providing accurate information about the points-to relations. This thesis attempts to quantify the amount of uncertainty of the points-to relations that remains after a state-of-the-art context- and flow-sensitive pointer analysis algorithm is applied to a collection of programs from two well-known benchmark suites: SPEC integer and MediaBench. This remaining static uncertainty is then compared to the run-time behaviour. Unlike previous work that compared run-time behaviour against less accurate context- and flow-insensitive algorithms, the goal of this work is to quantify the amount of uncertainty that is intrinsic to the applications and that defeat even the most accurate static analyses. In a first step to quantify the uncertainties, a compiler framework was proposed and implemented. It is based on the SUIF1 research compiler framework and the SPAN pointer analysis package. This framework was then used to collect extensive data from the static points-to analysis. It was also used to drive a profiled execution of the programs in order to collect the real run-time points-to data. Finally, the static and the run-time data were compared. Experimental results show that often the static pointer analysis is very accurate, but for some benchmarks a significant fraction, up to 25%, of their accesses via pointer dereferences cannot be statically fully disambiguated. We find that some 27% of these de-references turn out to access a single memory location at run time, but many do access several different memory locations. We find that the main reasons for this are the use of pointer arithmetic and the fact that some control paths are not taken. The latter is an example of a source of uncertainty that is intrinsic to the application.