making context-sensitive inclusion-based pointer analysis ...reports/papers/201315.pdfmaking...

Making Context-Sensitive Inclusion-based

Pointer Analysis Practical for Compilers

Using Parameterised Summarisation

Yulei Sui1 Sen Ye1 Jingling Xue1

1University of New South Wales, Australia{ysui,yes,jingling}@cse.unsw.edu.au

Technical ReportUNSW-CSE-TR-201315

June 2013

THE UNIVERSITY OF

NEW SOUTH WALES

School of Computer Science and EngineeringThe University of New South Wales

Sydney 2052, Australia

Abstract

Due to its high precision as a flow-insensitive pointer analysis, Andersen’s analysis has beendeployed in some modern optimizing compilers. To obtain improved precision, we describehow to add context sensitivity on top of Andersen’s analysis. The resulting analysis, calledIcon, is efficient to analyse large programs while being sufficiently precise to drive compileroptimisations. Its novelty lies in summarising the side effects of a procedure by using onetransfer function on virtual variables that represent fully parameterised locations accessedvia its formal parameters. As a result, a good balance between efficiency and precision ismade, resulting in Icon that is more powerful than a 1-callsite-sensitive analysis and lessso than a call-path-sensitive analysis (when the recursion cycles in a program are collapsedin all cases).

We have compared Icon with Fulcra, a state of the art Andersen’s analysis that iscontext-sensitive by acyclic call paths, in Open64 (with recursion cycles collapsed in bothcases) using the 16 C/C++ benchmarks in SPEC2000 (totalling 600 KLOC) and 5 Capplications (totalling 2.1 MLOC). Our results demonstrate scalability of Icon and lackof scalability of Fulcra. Fulcra spends over 2 hours in analysing SPEC2000 and failsto run to completion within 5 hours for two of the five applications tested. In contrast,Icon spends just under 7 minutes on the 16 benchmarks in SPEC2000 and just under26 minutes on the same two applications. For the 19 benchmarks analysable by Fulcra,Icon is nearly as accurate as Fulcra in terms of the quality of the built SSA form andthe precision of the discovered alias information.

1 Introduction

Pointer analysis is critical to enable advanced and aggressive compiler optimisations. An-dersen’s inclusion-based analysis [2] is a highly precise pointer analysis, which is flow-insensitive (by ignoring control-flow) and context-insensitive (by ignoring calling contexts).With its recent advances, Andersen’s analysis, which is more precise than Steensgaard’sunification-based analysis [42], is now scalable for large programs [16]. In the latest releaseof the Open64 compiler, its pointer analysis is no longer unification- but rather inclusion-based, performed with offset-based field sensitivity and 1-callsite-sensitive heap cloning(with malloc wrappers being recognised as heap allocation sites). Just like GNU GCC,the overall pointer analysis framework in Open64 remains context-insensitive. However,many compiler optimisations benefit, in both precision and effectiveness, from more pre-cise points-to information if context-sensitivity is also considered. Unfortunately, existingcontext-sensitive versions of Andersen’s analysis are not scalable to millions of lines of code.To the best of our knowledge, there is presently no suitable context-sensitive Andersen’sanalysis that can be deployed in modern compilers such as Open64 and GCC.

In this paper, we introduce a whole-program context-sensitive Andersen’s analysis forC/C++ programs, called Icon, that scales to millions of lines of code. Icon is signifi-cantly faster than the state-of-the art while achieving nearly the same precision. Whileimplemented fully in Open64, our analysis applies to any inclusion-based framework.

The development of Icon has been guided by three design principles:

Precision For an optimizing compiler, its pointer analysis must soundly estimate thepoints-to information in a program. As far as performance improvements are con-cerned, it will be unnecessarily costly to obtain context sensitivity if the context ofa call is identified by its full call path. Instead, we prefer to find a faster solutionthat may not be theoretically powerful but is practically precise enough in drivingcompiler optimisations. Our measurement of precision is the quality of the builtSSA form (in terms of χ and µ operations introduced [8]) and the percentage ofaliases disambiguated. We believe that these two metrics are critical in determiningthe effectiveness of compiler optimisations, such as register allocation [5], instructionscheduling [37], redundancy elimination [4, 53], scratchpad management [25, 26, 56]and speculative parallelization [14, 36, 47, 58].

Efficiency Context sensitivity should be achieved on top of Andersen’s analysis with aslittle overhead as possible for large programs.

Simplicity The solution should be simple conceptually and implementation-wise. To thisend, some recent advances in inclusion-based analysis should be leveraged so thatthe existing code base is maximally reused. Specifically, we prefer to achieve contextsensitivity by staying in the same inclusion-based analysis framework.

1.1 The State of the Art

A context-insensitive pointer analysis does not distinguish between different invocationsof a procedure. When analysing a program, passing parameters and return values betweenprocedures is modeled as assignments without distinguishing their calling contexts. Someprecision loss is illustrated in Figure 1.1. In Figure 1.1(a), the information from one calleris allowed to flow into another. So both x and y may point to a and b. In Figure 1.1(b),the information from the two callsites is merged at the entry of foo so that *p and *q

1

1 void goo() {2 int a, b;3 int *x, *y;4 x = foo(&a);5 y = foo(&b);6 }78 int* foo(int *p) {9 return p;

10 }

1 int g;2 void goo1() {3 int *a, *b;4 foo(&a, &b);5 }6 void goo2() {7 int *c;8 foo(&c, &c);9 }

1011 void foo(int **p, int **q) {12 int *r;13 *p = &g;14 r = *q;15 }

(a) (b)

Figure 1.1: Imprecision in context-insensitive analysis.

are considered to alias in both calling contexts. As a result, r is regarded as pointing tog always even though this is possible only when foo is called inside goo2.

A context-sensitive pointer analysis improves precision by representing calling contextsof a procedure more accurately. An analysis is k-callsite context-sensitive if differentinvocations of a procedure are distinguished by k enclosing callsites in its calling contexts.In practice, the contexts of a method call are often represented by their acyclic call paths,with recursion cycles collapsed or unrolled k times. If k represents the maximal length ofacyclic call paths, then the context sensitivity is achieved with full call paths. In Figure 1.1,precise points-to sets can be obtained with k = 1.

For Java (which relies on heap-only object allocation), a lot of progress has been madein context-sensitive points-to analysis [22, 29, 31, 38, 39, 40, 41, 46, 48, 49, 51, 52, 54, 55].However, we have not seen a corresponding success for C and C++ due to their supportfor pointer operations such as address-of operator & for both heap and stack objects andpointer arithmetic. Many flow- and/or context-sensitive analyses [6, 17, 19, 20, 24, 23, 45,50, 57] have been proposed. According to [1], however, current industrial-strength flow-and context-sensitive versions of Andersen’s analysis are scalable only for small C/C++programs. They are not deployable yet in an optimizing compiler. Even if flow sensitivityis ignored, how to achieve context sensitivity efficiently and precisely on top of Andersen’sanalysis for whole C/C++ programs remains to be an open problem. Among some earlierattempts at making Andersen’s (flow-insensitive) analysis context-sensitive [13, 32, 33,30], Fulcra [32] represents a state-of-the art solution. By cloning (conceptually) thestatements in a procedure that may have interprocedural points-to side effects and theninlining them directly in its callers, Fulcra obtains the most precise points-to informationas an inclusion-based analysis by being context-sensitive with acyclic call paths (i.e., withrecursion cycles collapsed). In obtaining such cloning-based precision, however, Fulcradoes not scale to some large programs [32].

2

188.

amm

p

179.

art

256.

bzip

2

186.

craf

ty

252.

eon

183.

equa

ke

254.

gap

176.

gcc

164.

gzip

181.

mcf

177.

mes

a

197.

pars

er

253.

perlb

mk

300.

twol

f

255.

vort

ex

175.

vpr

gdb-

6.8

http

d-2.

064

icec

ast-2

.3.1

send

mai

l-8.1

4.2

win

e-0.

9.24

aver

age0%

20%

40%

60%

80%

100%#PFPS=0 #PFPS=1 #PFPS=2 #PFPS=3 #PFPS>=4

(a) Distribution of procedures according to the number of pointer formal parameters (PFPs)

188.

amm

p

179.

art

256.

bzip

2

186.

craf

ty

252.

eon

183.

equa

ke

254.

gap

176.

gcc

164.

gzip

181.

mcf

177.

mes

a

197.

pars

er

253.

perlb

mk

300.

twol

f

255.

vort

ex

175.

vpr

gdb-

6.8

http

d-2.

064

icec

ast-2

.3.1

send

mai

l-8.1

4.2

win

e-0.

9.24

aver

age

0%

20%

40%

60%

80%

100%

2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3

#LEVELS=1 #LEVELS=2 #LEVELS=3 #LEVELS>=4

(b) Distribution of PFPs in procedures with two or three PFPs according to the number of levels of indirection

Figure 1.2: Pointer-related information for formal parameters using Andersen’s analysis.

3

1.2 Our Insights

Our key observation is that real-world C/C++ programs are likely to be dominated byprocedures with a small number of pointer formal parameters, which are each dereferencedwith a few levels of indirection. Figure 1.2 plots some pointer-related information amongthe formal parameters of the procedures in the 21 programs used in our experiments,including the 15 C and one C++ benchmarks from SPEC2000 (totalling 600 KLOC) and5 applications (totalling 2.1 MLOC). Most procedures (92.5% on average) have fewer thanfour pointer formal parameters (PFPs). Among the procedures with two (three) PFPs,most of their PFPs, with an average of 89.8% (91.9%), have fewer than four levels ofindirection. This suggests that in the code written by programmers, the formal parametersof a procedure at its entry tend to have few and simple aliasing relations.

This observation has led to the design of our Icon analysis. Its novelty lies in exploit-ing parameterised pointer information to achieve context sensitivity on top of Andersen’sanalysis. For a procedure being analysed, Icon represents the abstract locations passedfrom its callers and accessed by its dereferenced formal parameters using virtual variablesand keeps track of their aliasing relations during the analysis. This enables the interproce-dural points-to side effects of a procedure to be summarised with a transfer function thatmaps each virtual variable to its points-to set. Each points-to relation is guarded by analiasing condition on virtual variables so that context sensitivity can be achieved when itis “transferred” to its callsites. By using virtual variables rather than individual locations,the propagation of points-to information is significantly accelerated.

In theory, Icon is more powerful than a 1-callsite-sensitive analysis but less so thana cloning-based context-sensitive analysis like Fulcra (when the recursion cycles in aprogram are collapsed in all cases). In practice, Icon is significantly faster than Fulcrawhile achieving nearly the same precision. In addition, Icon is simple as it can be imple-mented easily on top of Andersen’s analysis, which is widely used with industrial-strengthimplementations available.

1.3 Contributions

While symbolic names [11, 20, 28, 50, 57] and transfer functions [50, 57] were previouslyused, Icon exploits both in a novel way to obtain a scalable context-sensitive Andersen’sanalysis.

• We introduce a context-sensitive Andersen’s inclusion-based pointer analysis, Icon,that can be directly and easily deployed in modern optimizing compilers. Whileexisting solutions are not scalable, Icon, which is fully implemented in Open64,allows large programs (with millions of lines of code) to be analysed in minutes.

• Icon is the first to achieve context sensitivity on top of Andersen’s analysis bycomputing the transfer function for a procedure based on parameterised pointer in-formation in terms of virtual variables, motivated by the pointer-related informationat the procedure entries in real-world C/C++ programs (Figure 1.2). In addition,we also propose to accelerate Icon by pre-analysis (to discover non-aliased parame-ters) and by propagating points-to information first top-down and then bottom-upon the call graph of a program (to discover aliased parameters eagerly).

• We have evaluated Icon by comparing with Fulcra, a state of the art Andersen’sanalysis that is context-sensitive by acyclic call paths [32], in Open64 (with recursioncycles collapsed in both cases) using 21 C/C++ programs, including 15 C and 1

4

C++ benchmarks in SPEC2000 (600 KLOC) and 5 C applications (2.1 MLOC).Our results demonstrate scalability of Icon and lack of scalability of Fulcra forsome large programs. Fulcra spends over 2 hours in analysing SPEC2000 and failsto run to completion within 5 hours for two of the five applications tested, wine

and gdb. In contrast, Icon spends only just under 7 minutes on SPEC2000 andjust under 26 minutes on both wine and gdb. For the 19 benchmarks analysable byFulcra, Icon is nearly as accurate as Fulcra in terms of the quality of the builtSSA form and the precision of the discovered alias information.

The rest of this paper is organised as follows. Section 2 provides some more backgroundinformation. Section 3 motivates Icon with an example used throughout the paper. Sec-tion 4 introduces the Icon analysis. Section 5 evaluates Icon and compares it with thestate-of-the art. Section 6 discusses related work and Section 7 concludes the paper.

2 Background

We first describe the canonical representation used for a program and then introduceAndersen’s analysis by constraint resolution.

2.1 Program Representation

Four types of statements are considered: x = &y (address), x = ∗y (load), ∗x = y (store)and x = y (copy). Note that x = ∗∗ y can be transformed into x = ∗t and t = ∗y byintroducing a new temporary t. Different fields of a struct are distinguished but arraysare considered monolithic.

void goo(· · · ) {int *a1, . . . , *an, *r;. . .r = foo(a1, . . . , an);. . .

}

int* foo(int *f1, . . . , int *fn) {int *s;. . .return s;

}

void goo(· · · ) {int *a1, . . . , *an, *r;. . .foo(&r, a1, . . . , an);. . .

}

void foo(int **r′, int *f1, . . . , int *fn) {int *s;. . .*r′ = s;

}(a) (b)

Figure 2.1: Passing return values modeled as passing parameters.

Every call contained in a procedure goo has the form r = foo(a1, . . . , an), where r,a1, . . . , an are local variables in goo and foo is a callee (resolved on the fly during pointeranalysis). As Icon is context-sensitive, we must keep track of the (modification) sideeffects made by foo on the variables accessed in goo. For efficiency considerations, themodification side effects on the global variables made in all procedures are tracked globallyin the standard manner as in [28, 32]. In contrast, the non-global variables accessed ingoo may be modified in foo in two ways: (1) via the formal parameters of foo and (2)by passing a return value to goo. Such modification side effects are referred to as the side

5

effects of foo in this paper and tracked by using a transfer function for foo. In order todeal with these two types of side effects on non-globals uniformly, we perform a standardtransformation as shown in Figure 2.1 so that (2) can be dealt with equivalently as (1).

2.2 Constraint-based Andersen’s Analysis

As illustrated in Figure 2.2, Andersen’s analysis discovers points-to information by treatingassignments as subset constraints using a single constraint graph for the entire programuntil a fixed point is reached. When context sensitivity is not considered, passing param-eters and return values between procedures is simply modeled as copy statements.

y=&x;

n=&g;

m=n;

*y=m;

t=*y;

(a) code

Copy Store Load

myt

n

x

{g}

{x}myt

n

x

{g}

{x} {g}

{g}

{g}

(b) Initialization (c) Fixed point

Figure 2.2: Pointer resolution in a constraint graph.

For the code in Figure 2.2(a), Andersen’s analysis starts with the constraint graph givenin Figure 2.2(b). For the address statements, y=&x and n=&g, the points-to informationis directly recorded for their left-hand side variables. For each of the other three types ofstatements, a constraint of an appropriate type is introduced. Then the analysis resolvesloads and stores by adding new copy statements discovered. As y points to x, the twonew copy statements related to x are added as shown in Figure 2.2(c). The new points-toinformation discovered is propagated along the edges in the graph until a fixed point isfound. Finally, t is found to point to g.

3 A Motivating Example

We use an example as shown in Figure 3.1 to illustrate how we achieve context sensitiv-ity on top of Andersen’s analysis. The key novelty is to summarise the side effects of aprocedure in terms of virtual variables that represent fully parameterised pointer infor-mation at its entry. Guided by the three design principles discussed earlier in Section 1,Icon is developed to discover precise points-to information efficiently in a simple way byleveraging the same constraint resolution engine used by Andersen’s analysis.

In the program given in Figure 3.1(a), *p and *q are made aliases but *a and *b arenot aliased after the call to bar. Thus, after the two calls to foo, g will be pointed to by r1

but not by c1. We examine how this fact is discovered by Icon. In a context-insensitiveanalysis, however, g will be conservatively estimated as being pointed to by both r1 andc1.

Icon analyses a program by traversing its call graph (with recursion cycles collapsed)iteratively, first top-down and then bottom-up. In a top-down phase, the points-to infor-mation is propagated downwards from a caller to its callees, with the side effects of all

6

1 int g, i, j, k;2 void main() {3 int **p, **q, **r, **a, **b, **c;4 int *p1, *q1, *r1, *a1, *b1, *c1;5 p = &p1; q = &q1; r = &r1;6 a = &a1; b = &b1; c = &c1;7 p1 = &i; q1 = &j; a1 = &k;8 bar(&p, &q);9 foo(p, q, r); // *p and *q are aliases but *a and *b are not

10 foo(a, b, c);11 }

12 void bar(int ***u, int ***v) {13 *u = *v;14 }15 void foo(int **x, int **y, int **z) {16 int *m, *t;17 m = &g;18 *y = m;19 t = *x;20 *z = t;21 }

(a) An example program

x

Alias Graph: Points-toy z

V x1 = V y

1 = V z1 ={p1, a1} {q1, b1} {r1, c1}

V x2 = V y

2 = V z2 ={i, k} {j} ∅

true

true

true

true

true

truex t z

y m

V y1

V x1

{(true, V z1 )}

V z1

true

true

true

{(true, g)}

{(true, g)}

{(true, V x1 )}

{(true, V x2 )}

{(true, V y1 )}

{(true, V x2 )}

{(true, V x2 )}

Constraint Graph: Copy Store Load

Transfer Function:Transfoo(V y

1 ) = {(true, g)}Transfoo(V z

1 ) = {(true, V x2 )}

x

Alias Graph: Points-toy z

V x1 = Ax,y

1,1V y1 = V z

1 =

Cx,y1,1

Cx,y1,1

Cx,y2,2

Cx,y2,2

{p1, q1, a1} {q1, b1} {r1, c1}

V x2 = V y

2 = V z2 =Ax,y

2,2{i, j, k, g} {j, g} {i, k}true

true

true

true

true

true

x t z

y m

V y1

V x1

Ax,y1,1

{(true, V z1 )}

{(Cx,y1,1, g)}

V z1

Cx,y1,1

Cx,y1,1

true

true

true

{(true, g)}

{(true, g), (Cx,y2,2, A

x,y2,2)}

{(true, V x1 ), (Cx,y

1,1, Ax,y1,1)}

{(true, V x2 ), (Cx,y

2,2, Ax,y2,2)}

{(true, V y1 ), (Cx,y

1,1, Ax,y1,1)}

{(Cx,y1,1, g), (true, V x

2 )}

{(Cx,y1,1, g), (true, V x

2 )}

Constraint Graph: Copy Store Load

Transfer Function:Transfoo(V y

1 ) = {(true, g)}Transfoo(V z

1 ) = {(true, V x2 ), (Cx,y

1,1, g)}

(b) First iteration (c) Second iteration

Figure 3.1: Parameterised summarisation for foo with virtual variables. The points-to setfor a node is not completely shown but it can be read off as the set of all objects reachingthe node by copy edges.

callsites being ignored. In a bottom-up phase, the propagation of the points-to informationis reversed from a callee to its callers, with the points-to side effects of the callee beingsummarised and transferred to its callsites. In both phases, new points-to information isdiscovered in the same constraint resolution framework.

We focus on how foo is summarised and how its summarised side effects are transferredto its two callsites. The call graph comprises a root node, main, and its two child nodes,bar and foo.

3.1 First Iteration

Top-Down The analysis starts with main and then moves to bar and foo. The points-torelations in lines 5 – 7 in main are discovered trivially and propagated downwards into bar

and foo. In our analysis, the points-to relations from different callsites in a procedure aremerged and represented using an alias graph at its entry but handled (at least 1-callsite)

context-sensitively. At this stage, the one for foo is given in Figure 3.1(b). V f1 and V f

2 ,

7

where f ∈ {x,y,z}, stand for ∗f and ∗ ∗f , respectively. These are virtual variables, eachof which represents the set of non-local locations passed from the two callsites of foo andaccessed by the dereferenced parameters in foo.

Each procedure has its own constraint graph except that copy edges are guarded, stat-ing the conditions under which the corresponding relations hold. In the special case whena copy edge is guarded by true, then the corresponding relation always holds. Andersen’sanalysis is applied to the constraint graphs of all procedures combined, except that (1) theside effects of all callsites are ignored and (2) virtual variables are used to parameterisepointer information at procedure entries. In the case of foo, the initial constraint graph(not shown) comprises (1) the constraints corresponding to the statements in lines 17 –20 and (2) the points-to relations in its alias graph. After the fixed-point is reached, weobtain the constraint graph given in Figure 3.1(b).

Bottom-Up Andersen’s analysis is applied separately to the constraint graphs of differ-ent procedures. The interprocedural points-to side effects of a procedure are summarisedand transferred to its callsites. In the case of foo, there is no need to re-run Ander-sen’s analysis as no new points-to information is discovered. The transfer function offoo that maps V

y1 and V z

1 to their points-to sets as shown is obtained. Note that x isnot modified inside. Similarly, the transfer function of bar (not shown) is computed:Transbar(V

u1 ) = {(true, V v

2 )}. When main is analysed, the side effects of foo on V y1 and

V z1 are transferred to its two callsites. For the first callsite, V y

1 and V z1 stand for q1 and

r1, respectively. So q1 has a new target g and r1 a new target i. For the second callsite,V

y1 and V z

1 stand for b1 and c1, respectively. So b1 has a new target g and c1 a newtarget k. Similarly, applying bar’s transfer function to its callsite, we find that p has anew target q1.

3.2 Second Iteration

Top-Down Some new points-to relations that have been discovered in the first itera-tion are propagated downwards. At this stage, foo’s alias graph is the same as that inFigure 3.1(b) except that the contents of its virtual variables have been updated with thenew points-to information, as shown in Figure 3.1(c). In addition, the constraint graphfor foo remains the same as the one in Figure 3.1(b).

Bottom-Up When foo is analysed, its alias graph is inspected. At this stage, thealias graph is the same as the one shown in Figure 3.1(c) except that the four points-

to relations, xCx,y

1,1−→ Ax,y1,1 , y

Cx,y1,1−→ Ax,y

1,1 , V x1

Cx,y2,2−→ Ax,y

2,2 and V y1

Cx,y2,2−→ A

x,y2,2 , do not exist yet.

Once some new points-to information is available in an alias graph, Icon will proceed toidentify and establish all the new aliasing relations between virtual variables and updatethe alias graph accordingly. In the case of foo, V x

1 and V y1 are found to alias. In addition,

V x2 and V y

2 are also found to alias. To represent these new aliasing relations, the fouraforementioned points-to relations are introduced. Here, Cx,y

1,1 is an aliasing condition that

encodes V x1 ∩V

y1 6= ∅. Similarly, Cx,y

2,2 encodes V x2 ∩V

y2 6= ∅. A

x,y1,1 (Ax,y

2,2 ) symbolises the set

of aliased locations between V x1 and V y

1 (V x2 and V y

2 ). Such sets are placeholders as theircontents are not directly used during pointer resolution. Hence, the contents of Ax,y

1,1 and

Ax,y2,2 are not shown in Figure 3.1(c). It is the presence of aliasing relations like Cx,y

1,1 and

Cx,y2,2 that serves to enable points-to information to be propagated conditionally across the

aliased locations.

8

Re-propagating the new points-to information just introduced across the constraintgraph for foo in Figure 3.1(b) yields the fixed point given in Figure 3.1(c). As a result,the transfer function for foo is updated from the one given in Figure 3.1(b) to the onegiven in Figure 3.1(c). During this second round of guarded constraint resolution, the twonew copy edges added for Ax,y

1,1 are guarded by Cxy1,1. This indicates that V z

1 points to g

only when Cx,y1,1 holds.

By applying foo’s transfer function to its first callsite at line 9, we find that Cx,y1,1 =

(V x1 ∩ V

y1 6= ∅) = ({q1} 6= ∅) = true, where V x

1 = {p1,q1} and V y1 = {q1} contain only

the locations propagated from the first callsite. Thus, j and g are new targets of r1. Forthe second callsite at line 10, Cx,y

1,1 = (V x1 ∩ V

y1 6= ∅) = (∅ 6= ∅) = false, where V x

1 = {a1}and V y

1 = {b1} contain only the locations propagated from the second callsite. So no newpoints-to relations are found, implying that g cannot be pointed to by c1. This mappingprocess is explained in more detail in Example 5.

4 The Icon Algorithm

As motivated by our example, the earlier Icon discovers the aliasing information at proce-dure entries, the earlier it can find the points-to sets for more pointers, and consequently,the faster the analysis converges. Therefore, its three components are structured as shownin Algorithm 1.

ALGORITHM 1: The Icon Algorithm

1 Pre-Analysis2 repeat3 Top-Down Analysis4 Bottom-Up Analysis

until5 a fixed point is reached ;

Pre-analysis discovers non-aliased formal parameters and initialises the alias graphsfor all procedures. During a top-down phase, the points-to information in a program ispropagated top-down across its acyclic call graph, with all recursion cycles collapsed onthe fly. During a bottom-up phase, the direction of points-to information propagationis reversed. During each iteration, the top-down phase precedes the bottom-up phase sothat procedure pointers can be resolved as soon as possible as suggested in [32]. In Icon,doing so has an additional benefit: the aliasing relations among the formal parameters ofa procedure can be discovered as soon as possible.

Each procedure has its own constraint graph. A top-down phase processes the con-straint graphs of all procedures together while a bottom-up phase deals with each indi-vidually. In both phases, guarded constraint propagation is used. Icon achieves contextsensitivity by maintaining alias graphs during the analysis, causing the side effects of aprocedure to be summarised iteratively.

Section 4.1 discusses the guarded constraint resolution used in both top-down andbottom-up phases. Section 4.2 introduces pre-analysis. Section 4.3 focuses on top-downanalysis and Section 4.4 on bottom-up analysis. Section 4.5 discusses some salient prop-erties of Icon.

9

4.1 Guarded Constraint Propagation

Much progress has been made on efficiently solving the inclusion-based constraints forpointer analysis [16, 18, 30, 34, 35]. We extend a recent pointer resolution algorithm,which is adapted from Wave Propagation [35] and implemented by the Open64 team inthe Open64 compiler, to perform the guarded constraint resolution in Icon (in the presenceof virtual variables).

Table 4.1: Rules used by guarded constraint resolution.

Statement Constraint Resolution

Address x = &y {(true, y)} ∈ ptr(x)

Copy x = y ptr(x) ⊇true ptr(y)

Load x = ∗y ∀(c′, y′) ∈ ptr(y) : x ⊇c′ y′

Store ∗x = y ∀(c′, x′) ∈ ptr(x) : x′ ⊇c′ y

The rules used by SolveConstraints are given in Table 4.1. The notation ptr(p)stands for the points-to set of a variable p. The notation ptr(a) ⊇c ptr(b) means that foreach (c′, o) ∈ ptr(b), its propagation into a is conditional so that (c ∧ c′, o) ∈ ptr(a). InIcon, each pointed-to object is guarded. When resolving a load or a store, all copy edgesderived are guarded accordingly.

Example 1 When moving from Figure 3.1(b) to Figure 3.1(c), two copy edges are added.

When resolving *y = m in line 18, y points to (Cx,y1,1 , A

x,y1,1 ). So A

x,y1,1

Cx,y1,1⇐= m is added. When

resolving t = *x in line 19, x points to (Cx,y1,1 , A

x,y1,1 ). As a result, t

Cx,y1,1⇐= Ax,y

1,1 is added.

4.2 Pre-Analysis

Steensgaard’s unification-based analysis [42] is used to bootstrap Icon. By interpretingassignments as equality rather than subset constraints, Steensgaard’s analysis is less precisebut significantly faster than Andersen’s analysis. Pre-analysis serves two purposes. First,many formal parameters that do not alias, as observed in Figure 1.2, are discovered,avoiding unnecessary aliasing tests later. Second, an alias graph for each procedure thatis expressed in terms of virtual variables, as illustrated in our example, is initialised. Thedomain of a virtual variable, which denotes the set of locations that it represents in thesubsequent analysis, is also determined.

Let us describe how to initialise an alias graph, AGP = (VP , EP ), for a procedure P .We focus on an arbitrary parameter f of P since all parameters are handled identically andindependently. After Steensgaard’s analysis, the points-to graph Gf for f , which comprisesthe points-to relations originating from f , is always a “linked list” with at most one cycleat its end. As Steensgaard’s analysis is used, all locations pointed by a variable are unifiedinto the same equivalence class. For example, if the points-to relations originating from fare f → a, a → b and a → c, then Gf is f → {a} → {b, c} with two equivalence classes.If the points-to relations are f → a, a → b, b → c, c → b and c → d instead, then Gf

becomes f → {a} → {b, c, d}

x

with a cycle at its end.Let depth(f) be the number of equivalence classes in the points-to graph Gf . AGP =

(VP , EP ) is initialised as follows. For every parameter f of P , we add the depth(f) + 1

10

nodes that represent f (which is V f0 ), V f

1 , . . . , Vfdepth(f) and the depth(f) edges f

true−→ V f1 ,

V f1

true−→ V f2 , . . . , V

fdepth(f)−1

true−→ V fdepth(f). If Gf has a cycle (at its end), we also add

V fdepth(f)

true−→ V fdepth(f).

V f1 , . . . , V

fdepth(f) are called virtual variables, each of which is used to represent the set

of non-local locations passed from P ’s callers and accessed by P ’s dereferenced formal

parameters during the analysis. Intuitively, V fi stands for

︷︸︸︷∗ · · · ∗ f with exactly i ∗’s if

i < depth(f) and all︷︸︸︷∗ · · · ∗ f ’s with depth(f) or more ∗’s if i = depth(f). Thus, each

guarded edge thus introduced represents a points-to relation that always holds since theguard is true.

When Gf has a cycle, our parameterisation-based approach looks seemingly conserva-

tive since V fdepth(f) represents

︷︸︸︷∗ · · · ∗ f ’s with depth(f) or more ∗’s. However, as suggested

by the statistics given in Figure 1.2 and evaluated further in our experiments in Section 4.5,little precision is lost for real code.

Each virtual variable V fi is created with empty points-to information in pre-analysis

but will be iteratively updated (or filled up) during the subsequent pointer analysis. The

domain of V fi , denoted dom(V f

i ), is simply the i-th equivalence class in the points-to graphGf , with i starting from 1.

Example 2 AGfoo is initialised as shown in Figure 3.1(b), except that all the virtualvariables are empty initially. After Steensgaard’s analysis, the points-to graphs for Gx, Gy

and Gz are x → e1 → e2, y → e1 → e2 and z → e3 → e2, where e1 = {p1,q1,a1,b1},e2 = {i,j,k,g} and e3 = {r1,c1}. Thus, each formal parameter is associated with twovirtual variables. In addition, dom(V x

1 ) = dom(V y1 ) = e1, dom(V x

2 ) = dom(V y2 ) = e2,

dom(V z1 ) = e3 and dom(V z

2 ) = e2.

Lemma 1 Given a formal parameter f , V fi and V f

j never alias with each other if i andj are different.

Proof 1 As each virtual variable represents an equivalence class created by Steensgaard’sanalysis. Therefore, dom(V f

i ) ∩ dom(V fj ) = ∅. Hence, V f

i and V fj do not alias.

4.3 Top-Down Analysis

During a top-down phase, Icon resolves all the constraints in a program except that theside effects of all callsites are ignored. As a result, the points-to information propagatedinterprocedurally only flows from a caller to its callees. By summarising the side effects of aprocedure using a transfer function in terms of virtual variables, this phase is surprisinglysimple and efficient. Andersen’s analysis is simply performed on the constraint graphsof all procedures in a program simultaneously except the guarded constraint resolutiondescribed in Section 4.1 is used. To disregard the side effects of a callee invoked at acallsite, we simply do not apply its transfer function at the callsite.

The new points-to information is propagated downwards from a caller into its callees.Let P be a procedure invoked at a callsite. Let f be a formal parameter of P and a thecorresponding actual parameter at the callsite. Whenever some new points-to informationfor a is discovered, it is propagated into the virtual variables V f

1 , . . . , Vfdepth(f) of f . For

each target x pointed by a directly or indirectly such that x ∈ dom(V fi ), two cases are

distinguished. If x is a virtual variable, then x is flattened so that all the actual points-totargets represented by x are inserted into V f

i . Otherwise, x itself is inserted into V fi .

11

Example 3 Consider the top-down phase performed during the first iteration in Fig-ure 3.1(b). The constraint graph for foo initially consists of the four constraints in lines17 – 20 and those in its alias graph. In main, there are two callsites to foo. The points-torelations for their actual parameters are discovered in lines 5 – 7 in main. There are sixvirtual variables for foo with their domains given in Example 2. By propagating the newpoints-to relations discovered in main into these virtual variables, which are empty initially,we obtain foo’s alias graph in Figure 3.1(b). Note that p→ p1→ i and a→ a1→ k arepropagated into V x

1 and V x2 , q → q1 → j and b → b1 into V y

1 and V y2 , and r → r1 and

c → c1 into V z1 and V z

2 . Once Andersen’s analysis (using guarded constraint resolution)is completed, the fixed-point found for foo is given in Figure 3.1(b).

Suppose we add a call goo(x) inside foo, where goo(int **s) { ... } is unspecifiedhere. As x→ V x

1 , where V x1 = {p1,a1}, then p1 and a1 will be propagated into the virtual

variable V s1 of goo after the points-to target V x

1 is flattened since V x1 is virtual.

During top-down analysis, the call graph of a program is updated as follows. First, thecall graph is expanded on the fly whenever a new callee pointed by a procedure pointer isdetected. Second, each recursion cycle, i.e., SCC (strongly-connected component) detectedis collapsed so that all procedures contained inside are analysed context-insensitively.

4.4 Bottom-Up Analysis

During a bottom-up phase, Icon resolves all the constraints in a program by accountingfor the side effects of all callsites. As a result, the points-to information propagatedinterprocedurally flows upwards from a callee into its callers. Each SCC has its ownconstraint graph. Andersen’s analysis is applied to these constraint graphs separatelyusing our guarded constraint resolution.

We traverse the SCCs in a program’s call graph in their reverse topological order. Onvisiting an SCC, we discover its new points-to relations parametrically in terms of thevirtual variables at its entry. At the same time, its interprocedural points-to side effectson virtual variables are obtained iteratively. The entry of an SCC is formed by combiningthe entries of all procedures contained in the SCC admitting calls from outside the SCC.Thus, their alias graphs are naturally merged. For this reason, we shall speak of SCCsand procedures interchangeably below.

ALGORITHM 2: Bottom-Up Analysis

1 for each SCC P in the program’s call graph in reverse topological order do2 Stage 1: UpdateAliasGraph(P )3 Stage 2: ApplyTransFun(P )4 Stage 3: SolveConstraints(P )

As shown in Algorithm 2, an SCC P is analysed in three stages. First, P ’s alias graphis updated to establish any new aliasing relations for its virtual variables. Second, theinterprocedural points-to side effects at P ’s callsites are accounted for. Finally, a newround of guarded constraint propagation is started to discover more points-to relations forP , if needed. Due to a cyclic dependency between Stages 2 and 3, Section 11 can be readbefore Section 11 to ease understanding.

12

Stage 1: Updating Alias Graphs

After the preceding top-down phase in the current iteration is over, some virtual variablesof a procedure P may contain new points-to targets. As a result, there may be new aliasesformed among the virtual variables of P . Note that the virtual variables of the sameparameter do not alias with each other by Lemma 1.

ALGORITHM 3: UpdateAliasGraph(P ) //AGP =(VP , EP )

1 repeat2 for every pair of formal parameters f and g of P do3 if NON-ALIAS(f, g) then4 continue

5 for every pair of virtual variables V fi and V g

j in VP (for some i and j) such that

V fi ∩V

gj 6=∅ do

6 if there exists C(V fi , V g

j ) = V fi ∩ V g

j 6= ∅ encoded earlier in line 8 then

7 continue

8 Encode C(V fi , V g

j ) as V fi ∩ V g

j 6= ∅9 Add a new aliasing variable Af,g

i,j , which is a placeholder standing for V fi ∩ V g

j

10 Add V fi−1

C(Vfi ,V

gj )

−→ Af,gi,j and V g

j−1

C(Vfi ,V

gj )

−→ Af,gi,j

where V f0 denotes f and V g

0 denotes g by the construction of an alias graph(when i = 1)

until11 no new aliases are found ;

UpdateAliasGraph given in Algorithm 3 is simple. We examine every pair of pa-rameters f and g to look for new aliasing relations (line 1). If f and g are discovered notto alias in pre-analysis, we are done (lines 3 – 4). Otherwise, we proceed to discover new

aliasing relations between V fi , a virtual variable of f , and V g

j , a virtual variable of g (line

5). We introduce a new aliasing relation between V fi and V g

j in P ’s alias graph (lines 8

– 10) if it is not encompassed by an existing one (lines 6 – 7). In line 9, Af,gi,j symbolises

V fi ∩V

gj but is not actually computed. Such aliasing variables serve as a conduit to allow

points-to information to be propagated along aliased locations.

Example 4 Let us apply UpdateAliasGraph to foo during the second iteration illus-trated in Figure 3.1(c). Initially, foo’s alias graph is the same as that in Figure 3.1(b)except that the contents of all virtual variables are shown as in Figure 3.1(c). SinceV x1 ∩ V

y1 = {q1}, Cx,y

1,1 is introduced to encode V x1 ∩ V

y1 6= ∅. By using Ax,y

1,1 to symbolise

V x1 ∩ V

y1 , two new points-to relations, V x

0

C(V x1 ,V

y1 )−→ Ax,y

1,1 and V y0

C(V x1 ,V

y1 )−→ Ax,y

1,1 , are intro-

duced, where V x0 and V y

0 stand for x and y, respectively. Similarly, Since V x2 ∩V

y2 = {j, g},

Cx,y2,2 is introduced to encode V x

2 ∩ Vy2 6= ∅. By using Ax,y

2,2 to symbolise V x2 ∩ V

y2 , two new

points-to relations, V x1

C(V x2 ,V

y2 )−→ Ax,y

2,2 and V y1

C(V x2 ,V

y2 )−→ Ax,y

2,2 , are introduced. As a result,foo’s alias graph has been updated from Figure 3.1(b) to Figure 3.1(c).

It is easy to see that UpdateAliasGraph terminates as the number of aliasing re-lations is finite. The following lemma states that all aliasing relations at the entry of aprocedure are captured.

13

Lemma 2 (Aliasing) Let f and g be two formal parameters of a procedure P . If V fi and

V gj alias for some i and j, then there must exist an aliasing relation C and an aliasing

variable A such that V fi−1

C−→ A and V gj−1

C−→ A in P ’s alias graph, where C = C(V fi , V

gj )

and A = Af,gi,j .

Proof 2 Following from the construction of an alias graph given in Algorithm 3.

Stage 2: Applying Transfer Functions

Let us describe how to transfer context-sensitively the side effects of every callee Q invokedat each callsite I that is contained in a procedure P . For a virtual variable v of Q, wewrite MQ

I (v) to represent the subset of v that is propagated only from the callsite I. For

an aliasing condition c = vi ∩ vj 6= ∅ that appears in Q’s alias graph, we overload MQI by

writing MQI (c) to mean MQ

I (vi) ∩MQI (vj) 6= ∅, which represents the aliasing condition

created at the callsite I alone (as if the other callsites were non-existent) .

ALGORITHM 4: ApplyTransFun(P )

1 for each callsite I in P do2 for each callee Q invoked at callsite I do

3 for each virtual variable V fi of Q do

4 S ←−MQI (V

fi )

5 for each (c, v) ∈ TransQ(Vfi ) do

6 if MQI (c) holds then

7 if v is a virtual variable V gj of Q then

8 T ←−MQI (V

gj )

else9 T ←− {v}

10 for (s, t) ∈ S × T do11 Insert (true, t) into s’s points-to set in P ’s constraint graph

ApplyTransFun given in Algorithm 4 is straightforward. The transfer functionTransQ referred to in line 5 is defined precisely in Section 11 below. It is worth em-phasising that a guarded points-to relation (c, v) created in a callee Q is ignored at a

callsite I when its guard MQI (c) evaluates to false at the callsite (line 6). As a result, the

imprecision illustrated in Figure 1.1(b) is avoided.

Example 5 Consider Figure 3.1(c) again. Let us illustrate Algorithm 4 for main byapplying Transfoo(V

z1 ) = {(true, V x

2 ), (Cx,y1,1 , g)} to its two callsites at lines 9 – 10 where

foo is invoked. We have P = main, Q = foo and I ∈ {9, 10}. In the first callsite atline I = 9, we know from Figure 3.1(c) that Mfoo

9 (V z1 ) = {r1} and Mfoo

9 (V x2 ) = {i,j}.

From the target (true, V x2 ), the points-to relations r1

true−→ i and r1true−→ j are established

at this callsite. For the other target (Cx,y1,1 , g), we know that Mfoo

9 (Cx,y1,1 ) = (Mfoo

9 (V x1 ) ∩

Mfoo9 (V y

1 ) 6= ∅) = ({p1,q1} ∩ {q1} 6= ∅) = ({q1} 6= ∅) = true. So the points-to relation

r1true−→ g is also found. For the second callsite at line I = 10, Mfoo

10 (V z1 ) = {c1} and

Mfoo10 (V x

2 ) = {k}. As Mfoo10 (Cx,y

1,1 ) = (Mfoo10 (V x

1 ) ∩Mfoo10 (V y

1 ) 6= ∅) = ({a1} ∩ {b1} 6= ∅) =

(∅ 6= ∅) = false, only c1true−→ k is found at this second callsite.

14

Stage 3: Solving Constraints

The guarded constraint propagation is performed on the constraint graph of a procedureP using the rules given in Table 4.1 as follows [35] (by considering the new points-torelations added to P ’s alias graph and the new ones introduced at P ’s callsites). First,points-to cycles are detected and collapsed to make the constraint graph acyclic. Second,the points-to information is propagated across the existing copy edges. Third, all loadand store edges are processed, resulting in new copy edges to be added to the constraintgraph. The process terminates once a fixed point is reached.

Finally, the interprocedural points-to side effects of P are simply expressed on its virtualvariables with one single transfer function, denoted TransP . In particular, TransP (V f

i ) is

directly read off as the points-to set of V fi in P ’s constraint graph. However, V f

i+1 is not

included because ∗V fi = V f

i+1, i.e.,︷︸︸︷∗ · · · ∗ f =

︷︸︸︷∗ · · · ∗ f with i+1 ∗’s in both sides represents

a no-op (assignment).

Example 6 Consider Figure 3.1(c). The points-to set at a node is not shown completelyto avoid cluttering. After a fixed point has been reached, the points-to sets {(true, g)} and{(true, V x

2 ), (Cx,y1,1 , g} will be directly available at V y

1 and V z1 , respectively. So the transfer

function Transfoo as shown is trivially available.

4.5 Soundness and Context Sensitivity

As Icon represents a generalisation of Andersen’s analysis with context sensitivity beingrealised. It suffices to argue briefly why the soundness of our analysis is still maintained.An analysis is sound for a program if it over-approximates its points-to relations. Inaddition, we also examine the precision achieved by Icon in terms of the accuracy of itspoints-to information.

Theorem 1 (Soundness) Icon is sound.

Proof 3 Andersen’s analysis is sound. It suffices to show that its soundness is still main-tained by Icon in achieving context sensitivity in its three phases in Algorithm 1. Pre-analysis is sound as Steensgaard’s analysis is. Our top-down analysis performs essentiallyAndersen’s analysis with the side effects of all callsites ignored. Such side effects are con-sidered when performing Andersen’s analysis during our bottom-up analysis. Finally, allaliasing relations at procedure entries are tracked by Lemmas 1 and 2 and our guardedconstraint resolution rules are the same as the standard ones except that copy edges areguarded by aliasing relations correctly established by Lemma 2.

Recall that when the points-to graph Gf for a formal parameter f of a procedure P

contains a cycle at its end, V fdepth(f) stands for all

︷︸︸︷∗ · · · ∗ f ’s with depth(f) or more ∗’s in

P . In this case, some points-to relations may be over-approximated when P is analysed.Therefore, the following theorem is stated under a caveat related to such imprecisioninherited from Steensgaard’s analysis.

Theorem 2 (Context-Sensitivity) Suppose the points-to graph Gf obtained in pre-analysis is acyclic for every formal parameter f of every procedure P in a program. ThenIcon is context-sensitive for the program (1) by at least 1-callsite and (2) by (acyclic) callpaths if the formal parameters of every procedure are alias-free.

15

Proof 4 If Gf is acyclic, then every virtual variable V fi for every procedure P represents

precisely the set of non-local locations that may be passed from its callers, i.e.,︷︸︸︷∗ · · · ∗ a with

exactly i ∗’s for each corresponding actual parameter a. Therefore, Statement (1) is truebecause Icon’s aliasing information is at least 1-callsite-accurate (line 6 in Algorithm 4).In the absence of aliasing at all procedure entries, the side-effects of every procedure areaccounted for as if the procedure were cloned at each of its callsites. Thus, Statement (2)is true.

void goo() {int i, j;int *p, *q = &iint *a, *b = &j;bar(&p, q);bar(&a, b);

}void bar(int **x, int *y) {

foo(x, y);}void foo(int **u, int *v) {

*u = v;}

int g;void goo1() {

int *a, *b, *x;x = bar(&a, &b);

}void goo2() {

int *c, *y;y = bar(&c, &c);

}int* bar(int **x, int **y) {

int *z;foo(x, y, &z);return z;

}void foo(int **u, int **v, int **w) {

*u = &g;*w = *v;

}(a) (b)

Figure 4.1: Context-sensitivity of Icon.

It is possible to capture better the aliasing information at a procedure entry by usinga more precise but more expensive pre-analysis or building its virtual variables on the flytogether with Icon. This does not appear to be necessary in practice as Steensgaard’sanalysis is a good choice for our pre-analysis. Precision loss is small when some points-tographs Gf ’s have cycles at their tails (Section 4.2). For the 16 SPEC benchmarks used,176.gcc is the worst but with only 3.5% of all Gf ’s containing cycles. The average acrossthese benchmarks is only 1.3%.

Example 7 Let us illustrate Theorem 2 with two small programs. Icon is context-sensitive by call paths in Figure 4.1(a) due to the absence of aliases. Propagating themodification side effects of foo with Transfoo(V

u1 ) = {(true, V v

1 )} into bar, we obtainTransbar(V

x1 ) = {(true, V y

1 )}. By propagating the modification side effects of bar into its

two callsites, we obtain ptrue−→ i and a

true−→ j context-sensitively. In Figure 4.1(b), whichis slightly modified from Figure 1.1(b), Icon is only 1-callsite-context-sensitive. Given

that V x1 = {a,c} and V y

1 = {b,c}, V z1true−→ g is created when the callsite to foo in bar is

analysed. As a result, xtrue−→ g is generated when the callsite to bar in goo1 is analysed.

Similarly. ytrue−→ g is generated when the callsite to bar in goo2 is analysed. However,

the spurious xtrue−→ g will be avoided if a 2-callsite-context-sensitive analysis is used.

By using virtual variables, Icon is more precise than a 1-callsite-sensitive analysis (inthe same inclusion-based framework with recursion cycles collapsed in both cases) and

16

makes a good tradeoff between efficiency and precision as evaluated extensively below.

5 Evaluation

In our experimental validation, we show that Icon has met the three design principlesmentioned earlier in Section 1: precision, efficiency and simplicity. We present our resultsand analysis for 21 C/C++ programs with a total of 2.7 MLOC, including the 15 C pro-grams and 1 C++ from SPEC2000 (600 KLOC) as well as five open-source C applications(2.1 MLOC). The five applications are wine-0.9.24 (a tool that allows windows applica-tions to run on Linux), icecast-2.3.1 (a steaming media server), gdb-6.8 (a debugger),httpd-2.0.64 (an HTTP server) and sendmail-8.14.2 (a general-purpose internet emailserver).

Table 5.1: Benchmark characteristics (the last two columns are produced with no proce-dure pointers resolved).

Program KLOC #Procs #Pointers #Loads+Stores#Callsites

#SCCsLargest

Total Indirect SCC

164.gzip 8.6 113 3004 586 418 2 0 0175.vpr 17.8 275 7930 2160 1995 2 0 0176.gcc 230.4 2256 134380 51543 22353 140 179 398177.mesa 61.3 1109 44582 17320 3611 671 1 1179.art 1.2 29 600 103 163 0 0 0181.mcf 2.5 29 1317 526 82 0 1 1183.equake 1.5 30 1203 408 215 0 0 0186.crafty 21.2 112 11883 3307 4046 0 5 2188.ammp 13.4 182 9829 1636 1201 24 6 2197.parser 11.4 327 8228 2597 1782 0 42 3252.eon 41.2 1296 41950 4001 12733 80 17 1253.perlbmk 87.1 1079 54816 20900 8470 58 12 322254.gap 71.5 857 61435 22840 5980 1275 33 20255.vortex 67.3 926 40260 11256 8522 15 12 38256.bzip2 4.7 77 1672 434 402 0 0 0300.twolf 20.5 194 20773 8657 2074 0 5 1gdb-6.8 474.5 7810 337706 105917 52462 2967 170 128httpd-2.064 128.1 3000 60027 18450 3959 200 12 4icecast-2.3.1 22.3 603 15098 9779 877 40 14 1sendmail-8.14.2 115.2 2656 107242 29220 16973 381 31 76wine-0.9.24 1338.1 77829 1330840 137409 362787 23523 251 313

Some statistics on these 21 C/C++ programs, which are obtained on their intermediaterepresentations (including library code) before procedure pointers are resolved, are givenin Table 5.1. We will also briefly discuss and analyse our results on the 12 C/C++benchmarks from SPEC2006, which reveal the same trend as the 21 programs focused onin this paper.

Our computer platform is a 3.0GHz quad-core Intel Xeon running Red Hat EnterpriseLinux 5 (Linux kernel version 2.6.18) with 16GB memory.

17

5.1 Methodology

There are only a few earlier attempts [13, 32, 33, 30] to achieve context sensitivity on topof Andersen’s analysis for C/C++. As discussed in Section 6, Fulcra [32] represents astate-of-the art solution. It is also the most precise inclusion-based analysis since it clones(conceptually) the statements in a procedure with interprocedural points-to side effectsand inlines them at its callers.

Therefore, we compare Icon and Fulcra in terms of their efficiency and precision onanalysing large C/C++ programs. Just like Icon, Fulcra also collapses the recursioncycles (or SCCs) in a program so that the procedures in an SCC are analysed context-insensitively. Otherwise, Fulcra is even more unscalable, especially for large programs[32].

We show that Icon spends just under 35 minutes on analysing all the 21 programs inthe benchmark suite (an average of less than 1.7 minutes per program) while achievingnearly the same precision as that of Fulcra. In contrast, Fulcra spends over 2 hoursin analysing 19 programs and fails to run to completion in 5 hours for the remainingtwo. To highlight further the importance of our parameterisation-based approach, we alsodemonstrate the performance advantages of Icon over Nonpa. Recall that Nonpa is anon-parameterised version of Icon. So these two analyses have exactly the same precision.

We measure the precision of an analysis in terms of its capability in alias disambiguationand the quality of the SSA form constructed for a program. These two metrics are believedto be critically important in determining the effectiveness of compiler optimisations.

5.2 Implementation

We have implemented Icon, Nonpa, and Fulcra in Open64 (version 5.0), an open-sourceindustrial-strength compiler, on its High WHIRL IR at IPA (interprocedural analysis)phase. All the analyses are offset-based field-sensitive, by using an existing field-sensitiveanalysis module in Open64 modified to work with our guarded constraint resolution. Inthis module, the positive weight cycles (PWCs) that arise from processing fields of structobjects [34] are detected and collapsed. The maximum number of offsets considered for astruct is 256, with the last one representing the 256th and all subsequent offsets availablein the struct. However, arrays are considered monolithic. Heap objects are modeled withcontext-sensitive heap cloning [21, 43] for allocation wrappers. All wrappers are identifiedand treated as allocation sites in the manner as described in [7, 44]. All the three analysesare compiled under the optimisation flag “-O2” in Open64.

We have implemented Fulcra by following its algorithms [32]. For its top-down phase,we use the code implemented by the Open64 team and already made available in Open64,which is implemented by following the rules in [32, Figure 4.3]. For its bottom-up phase,we coded its summarisation using [32, Figure 4.11] and constraint compaction using [32,Figure 4.15].

We have extended an existing implementation of Wave Propagation [35] in Open64to support constraint resolution for all the three analyses evaluated. As discussed inSection 2.1, global variables are tracked separately for efficiency considerations, withoutparticipating in procedure summarisation. It is worth mentioning again that in eachanalysis, the recursion cycles (or SCCs) in a program are merged so that the proceduresin each SCC are analysed context-insensitively.

18

Table 5.2: Alias disambiguation and SSA quality of Icon and Fulcra (‘=’ means “sameas left”).

ProgramAlias Disambiguation SSA Quality

#Queries #Not Aliased (%) #µ’s #χ’sFulcra Icon Fulcra Icon Fulcra Icon Fulcra Icon

164.gzip 1483 = 38.57 = 10582 = 12252 =175.vpr 23557 = 75.24 = 26386 = 34172 =176.gcc 101187 101134 31.58 31.11 482184 482296 580224 580263177.mesa 144328 144398 32.13 32.08 30536 30541 71923 71935179.art 3772 = 84.51 = 2697 = 3396 =181.mcf 8588 = 30.34 = 1363 = 2625 =183.equake 9422 = 80.11 80.09 7768 = 8508 =186.crafty 15545 = 50.12 = 303532 = 243282 =188.ammp 38893 = 43.87 = 20539 = 28320 =197.parser 9529 9636 8.36 8.32 23855 = 33161 33165252.eon 84838 = 18.56 = 196940 = 309167 =253.perlbmk 98090 = 12.81 = 135050 135059 175081 175097254.gap 19360 = 2.11 = 68632 = 105058 =255.vortex 81542 = 32.31 = 210699 = 302539 =256.bzip2 2404 = 54.91 = 4174 = 5515 =300.twolf 92917 = 70.56 = 60575 = 66026 66158gdb-6.8 >5 hrs 540650 >5 hrs 17.07 >5 hrs 722775 >5 hrs 959068httpd-2.064 56689 = 11.41 = 167146 = 184687 =icecast-2.3.1 7774 = 25.94 = 18810 = 32910 =sendmail-8.14.2 144336 144398 30.71 30.67 517244 517286 581516 581599wine-0.9.24 >5 hrs 434805 >5 hrs 29.88 >5 hrs 1743726 >5 hrs 1865159

5.3 Results and Analysis

We first show that Icon is nearly as precise as Fulcra (which is context-sensitive byacyclic call paths) for the 21 programs given in Table 5.1. Note that Nonpa is a non-parameterised version of Icon and thus has the same precision as that of Icon. We thenanalyse why Icon achieves such precision even though running significantly faster thanFulcra and Nonpa.

Precision

The aliasing information is used extensively to guide compiler optimisations in variouspasses of Open64’s backend, e.g., WOPT (Whirl Optimiser), LNO (Loop-Nest Optimiser)and CG (Code Generator). As shown in Table 5.2, Icon detects the same or nearly thesame percentage of non-alias pairs in all the 21 benchmarks. This percentage metric isused since different aliasing relations detected earlier can affect queries issued later. Forboth algorithms, the same alias analysis interface, AliasAnalyzer, provided in Open64 isused to issue alias queries generated by various compiler optimisations. Only the onesthat require the results of a pointer analysis to disambiguate are issued.

Many program analysis and compiler optimisations are nowadays performed on SSA.

19

Table 5.3: Analysis times of Icon, Nonpa and Fulcra.

ProgramAnalysis Time (secs)

Icon Nonpa Icon Speedup Fulcra Icon Speedup

164.gzip 0.03 0.03 1.00 0.03 1.00175.vpr 0.07 0.16 2.29 0.15 2.14176.gcc 161.60 689.42 4.27 3805.01 23.55177.mesa 39.56 87.60 2.21 94.32 2.38179.art 0.00 0.00 1.00 0.00 1.00181.mcf 0.02 0.01 0.50 0.01 0.50183.equake 0.01 0.02 2.00 0.01 1.00186.crafty 0.08 0.14 1.75 0.13 1.63188.ammp 0.12 0.20 1.67 0.14 1.17197.parser 0.44 0.67 1.52 1.33 3.02252.eon 16.70 93.70 5.61 88.60 5.31253.perlbmk 81.75 615.32 7.53 3107.28 38.01254.gap 63.35 290.61 4.59 390.87 6.17255.vortex 19.59 37.66 1.92 44.10 2.25256.bzip2 0.01 0.02 2.00 0.02 2.00300.twolf 0.11 0.41 3.73 0.36 3.27gdb-6.8 586.51 2958.02 5.04 >5 hrs > 30httpd-2.064 67.84 153.22 2.26 148.46 2.19icecast-2.3.1 7.23 26.56 3.67 27.17 3.76sendmail-8.14.2 49.75 220.08 4.42 380.08 7.64wine-0.9.24 948.75 >5 hrs > 17 >5 hrs > 17

In Open64, SSA construction is intraprocedural based on the approach introduced in [8].For each store ∗x = y, a v = χ(v) operation is introduced for each location v pointed by x.Similarly, for each load x = ∗y, a µ(v) operation is introduced for each location v pointedby y. When converted to SSA form, each v = χ(v) is treated as both a def and use of vand each µ(v) as a use of v. In the absence of strong updates, the def v must incorporatethe pointer information from both y and the use v. As shown in Table 5.2, Icon andFulcra give rise to nearly the same SSA representations in all benchmarks (measured interms of χ and µ operations). For a benchmark, Icon results in the same SSA as Fulcraif Icon scores two =’s, one for Column “#µ’s” and one for Column “#χ’s”.

These results show that Icon is nearly as precise as Fulcra in terms of the qualityof the built SSA form and the precision of the discovered alias information.

Efficiency

From the analysis times given in Table 5.3 for the three analyses compared, we see thatIcon is the only one that scales to millions of lines of code.

Comparing Icon and Fulcra Fulcra spends over 2 hours analysing the 16 programsfrom SPEC2000 (totalling 600 KLOC) and fails to terminate within 5 hours when analysinggdb and wine (totalling 1.8 MLOC). In contrast, Icon spends just under 7 minutes on

20

SPEC2000 and just under 26 minutes on both gdb and wine together. For the other threeapplications, httpd, icecast and sendmail, Icon is also faster.

Fulcra is not scalable when analysing programs with large recursion cycles, as alsoreported by its author in [32, Table 4.3]. Fulcra conservatively identifies the side-effect-causing constraints (called critical edges) of a callee, then “inlines” them at its callers,and finally, resolves them at those callers to achieve cloning-based context-sensitivity.There are three reasons affecting its scalability. First, such critical edges are recomputed(conservatively) whenever a callee is re-summarised. Second, more edges than necessarymay be pasted from the callee into a caller, causing a rippling effect upwards its call chains.Third, the pasted edges are solved repeatedly at different callsites. As a result, many edgesin a program’s constraint graph are introduced unnecessarily, slowing down the analysisperformance.

Among the 19 benchmarks analyzable by Fulcra, 176.gcc is the most costly toanalyse, taking a little over an hour to complete, due to a large number of constraintsmoved from callees to their callers during its analysis, as highlighted in Figure 5.1. Incontrast, Icon spends less than 3 minutes in analysing this benchmark. It should be notedthat both Fulcra and Icon analyse all the 21 programs with their SCCs collapsed. Letus take a closer look at Figure 5.1. During the second iteration (from around the 4thminute to the 13th minute), Fulcra spends a lot of time on computing the summariesof the SCCs in the call graph of 176.gcc. The largest SCC, as highlighted in Figure 5.2,comprises nearly 400 procedures across 22 source files. During the period from the 15thminute to the 22nd minute in the third iteration and the period from the 40th minute tothe 53rd minute in the fourth iteration, over 3000 critical edges identified in the largestSCC are “inlined” at, i.e., promoted to each of its callers. In contrast, Icon avoids thiscost by iteratively summarising the side effects of each SCC in place and then transferringthe side effects to its callers.

0 5 10 15 20 25 30 35 40 45 50 55 600

50000

100000

150000

200000

250000

300000

Time (minutes)Time (minutes)

Num

ber o

f Con

stra

ints

Iter 1 Iter 2 Iter 3 Iter 4

Figure 5.1: Side-effect-causing constraints promoted from a callee to its callers for 176.gccby Fulcra. The two rectangles are drawn to highlight the parts of those iterations whenthe largest SCC is analysed.

Comparing Icon and Nonpa Nonpa does not terminate within 5 hours when analysingwine. For the remaining 20 benchmarks, Nonpa spends 5173.85 secs (> 86 mins) whileIcon spends 1094.77 secs (< 19 mins).

Full parameterisation has no benefits for small programs such as art, equake and gzip

or even hurts performance in the case of mcf due to the use of virtual variables. However,

21

Figure 5.2: Call graph of 176.gcc, with part of the largest SCC (referred to in Table 5.1)zoomed-in.

significant performance improvements occur for large programs such as eon, gap, gcc,perlbmk, gdb, sendmail and wine. This is because these programs have a larger numberof pointers as well as loads and stores (Table 5.1). For some large programs such as mesa,vortex and vpr, the improvements are less remarkable since they require relatively morevirtual variables to be used, as shown in Table 5.4. In the case of httpd, with manypointers as well as loads and stores but few virtual variables, the improvement is smallsince the number of levels of indirections among its pointers is small (Figure 1.2).

More Analysis To the best of our knowledge, Icon is the fastest context-sensitiveinclusion-based pointer analysis ever reported, at least measured in terms of the 16 C/C++SPEC2000 benchmarks, five open-source applications, and the 12 C/C++ SPEC2006benchmarks (discussed in Section 5.4). There are several reasons behind this.

• By using virtual variables to parameterise pointer information, Icon propagatespoints-to information across copy edges significantly less frequently than Fulcraand Nonpa, as shown in Figure 5.3. This is particularly pronounced in the bench-marks for which Icon achieves the best speedups. The analysis overhead thus in-curred is small since the alias graphs, as shown in Table 5.4, are small.

• Analyzing a program top-down rather than bottom-up first has two benefits. Afterthe first top-down phase is completed, two observations are made. First, the majorityof aliasing variables (77.88% on average) in a program have been introduced as shownin Figure 5.4. In several small to medium benchmarks, such as ammp, equake, mcfand vpr, however, most aliases still need to be discovered. Second, the majorityof procedure pointer targets in a program have also been resolved, as shown inFigure 5.5. However, in some benchmarks, such as gcc and vortex, most of theirindirect call edges will have to be resolved later, because they need to be discoveredwith at least one bottom-up phase being performed.

• Pre-analysis helps avoid unnecessary checks for aliasing (lines 3 – 4 in UpdateAlias-Graph). Most of pointer formal parameters are not aliased as shown in Figure 5.6.On average, around 85% of the procedures in a program are alias-free.

22

Table 5.4: Alias graph Statistics.

ProgramAverage #Nodes in Alias Graphs over #Aliasing Var per Procedure

Total #Nodes in All Constraint Graphs AVG MAX

164.gzip 4.94 0.15 2175.vpr 13.81 0.71 8176.gcc 4.95 0.69 12177.mesa 14.39 0.68 9179.art 3.33 0.00 0181.mcf 10.10 1.00 5183.equake 5.07 0.10 1186.crafty 1.46 0.05 1188.ammp 4.41 0.30 3197.parser 5.72 0.41 6252.eon 5.27 0.15 4253.perlbmk 1.78 0.13 5254.gap 2.59 0.51 8255.vortex 13.01 0.51 8256.bzip2 4.25 0.00 0300.twolf 1.83 0.49 5gdb-6.8 9.88 0.45 16httpd-2.064 3.63 0.08 5icecast-2.3.1 2.20 0.05 2sendmail-8.14.2 5.98 0.45 12wine-0.9.24 4.21 0.66 15

Average 5.85 0.36 6

164.

gzip

175.

vpr

176.

gcc

177.

mes

a17

9.ar

t18

1.m

cf18

3.eq

uake

186.

craf

ty18

8.am

mp

197.

pars

er

252.

eon

253.

perlb

mk

254.

gap

255.

vort

ex25

6.bz

ip2

300.

twol

fgd

b-6.

8ht

tpd-

2.06

4

icec

ast-2

.3.1

send

mai

l-8.1

4.2

win

e-0.

9.24

aver

age0%

100%

200%

300%

400%

500%

NONPAFULCRA

9X 11X 14X 11X12X

Figure 5.3: Number of times on processing copy edges by Nonpa and Fulcra (normalisedw.r.t. Icon).

23

164.

gzip

175.

vpr

176.

gcc

177.

mes

a

179.

art

181.

mcf

183.

equa

ke

186.

craf

ty

188.

amm

p

197.

pars

er

252.

eon

253.

perlb

mk

254.

gap

255.

vort

ex25

6.bz

ip2

300.

twol

fgd

b-6.

8

http

d-2.

064

icec

ast-2

.3.1

send

mai

l-8.1

4.2

win

e-0.

9.24

aver

age0%

20%

40%

60%

80%

100%

Figure 5.4: Percentage of virtual variables discovered after the 1st top-down phase iscompleted.

164.

gzip

175.

vpr

176.

gcc

177.

mes

a

179.

art

181.

mcf

183.

equa

ke

186.

craf

ty

188.

amm

p

197.

pars

er

252.

eon

253.

perlb

mk

254.

gap

255.

vort

ex25

6.bz

ip2

300.

twol

fgd

b-6.

8

http

d-2.

064

icec

ast-2

.3.1

send

mai

l-8.1

4.2

win

e-0.

9.24

aver

age0%

20%

40%

60%

80%

100%

Figure 5.5: Percentage of procedure pointer targets resolved after the 1st top-down phaseis completed.

164.

gzip

175.

vpr

176.

gcc

177.

mes

a

179.

art

181.

mcf

183.

equa

ke

186.

craf

ty

188.

amm

p

197.

pars

er

252.

eon

253.

perlb

mk

254.

gap

255.

vort

ex25

6.bz

ip2

300.

twol

fgd

b-6.

8

http

d-2.

064

icec

ast-2

.3.1

send

mai

l-8.1

4.2

win

e-0.

9.24

aver

age0%

10%

20%

30%

40%

Figure 5.6: Percentage found to be aliased among all pairs of pointer formal parametersby pre-analysis.

24

Memory Usage Table 5.5 compares Icon, Fulcra and Nonpa in terms of memoryusage. For the 20 benchmarks that can be analysed by Nonpa, Icon consumes 16% morememory on average. Icon needs extra space to store virtual variables but saves space asparameterisation reduces the number of copy edges created (Figure 5.3). Compared withNonpa, Icon uses more memory in 15 out of the 20 benchmarks that are analysable byNonpa. For some small benchmarks with a few procedures and pointers (Table 5.1), suchas art, mcf and equake, Icon consumes slightly more memory than Nonpa. Some rela-tively large pointer-intensive benchmarks (Figure 1.2), including mesa, eon, gap, vortexand httpd, require at least 20MB of memory each to analyse in either case. For thesebenchmarks, Icon requires more space to store virtual variables and thus consumes slightlymore memory than Nonpa. For the largest benchmarks, such as gcc, perlbmk and gdb,Icon has succeeded in exploiting parameterisation to reduce significantly the number ofcopy edges created by Nonpa (Figure 5.3). As a result, for gcc, perlbmk and gdb, Iconconsumes only 80%, 71% and 70%, respectively, of the amount of memory consumed byNonpa.

Let us now compare Icon and Fulcra. For the 19 benchmarks that can be analysedby Fulcra, Icon consumes 12% more memory on average. The extra amount of memoryincurred by Icon for small and medium benchmarks is negligible. However, for largebenchmarks, Icon uses less memory than Fulcra. In the case of gcc, mesa and perlbmk,Icon consumes only 56%, 77% and 43%, respectively, of the amount of memory consumedby Fulcra, as Fulcra introduces a large number of extra constraints when promotingside-effect-causing statements from a callee to its callers, especially in the presence of largerecursion cycles (Figure 5.1).

Simplicity

For this design goal, we aim to leverage the recent advances in inclusion-based analysis sothat context sensitivity can be achieved in the same constraint resolution framework. Inother words, the existing code base should be reused as much as possible.

Icon’s pre-analysis is bootstrapped by the standard Steensgaard’s unification-basedanalysis [42]. During each iteration, both the top-down and bottom-up phases are per-formed using the same constraint resolution engine based on an existing module for WavePropagation [35] in Open64. This module is modified so that guarded copy edges arehandled in the presence of aliasing variables.

5.4 SPEC 2006

We briefly discuss the results obtained on comparing Icon and Fulcra using the 12 Cprograms in SPEC2006 (totalling 1.0 MLOC). Icon is once again nearly as precise asFulcra in terms of their capabilities in alias disambiguation and the quality of the SSAform built for a program.

Icon spends only less than 14 minutes (803.04 secs) in analysing all the 12 programsin SPEC2006. In contrast, Fulcra spends over 48 minutes (2922.48 secs) in analysing10 benchmarks and fails to terminate within 5 hours for the remaining two benchmarks,400.perlbench and 403.gcc. These are the top two in SPEC2006 ranked in terms of howmany pointers and SCCs they contain: 400.perlbench has 13283 pointers, 46792 loadsand stores and 30 SCCs (with the largest SCC containing 461 procedures) and 403.gcc has35697 pointers, 123389 loads and stores and 317 SCCs (with the largest SCC containing436 procedures). As in the case of Table 5.1, the statistics on SCCs are collected before

25

Table 5.5: Memory usage of Icon, Nonpa and Fulcra.

ProgramMemory Usage (MBs)

Icon NonpaIcon Increase

FulcraIcon Increase

(Icon/Nonpa) (Icon/Nonpa)

164.gzip 2.90 1.15 2.52 1.23 2.36175.vpr 9.40 8.23 1.14 8.15 1.15176.gcc 425.86 535.11 0.80 754.38 0.56177.mesa 67.36 60.48 1.11 87.37 0.77179.art 0.83 0.82 1.01 0.82 1.01181.mcf 1.09 1.09 1.00 1.06 1.03183.equake 1.07 1.07 1.00 1.07 1.00186.crafty 5.93 5.32 1.11 5.41 1.10188.ammp 7.21 7.01 1.03 6.81 1.06197.parser 10.85 9.55 1.14 8.56 1.27252.eon 68.07 58.07 1.17 59.90 1.14253.perlbmk 131.54 186.18 0.71 306.18 0.43254.gap 85.85 51.84 1.66 57.19 1.50255.vortex 38.20 34.88 1.10 35.96 1.06256.bzip2 2.18 1.67 1.31 1.71 1.27300.twolf 12.22 12.13 1.01 12.21 1.00gdb-6.8 1633.45 2333.45 0.70 >5 hrs >5 hrshttpd-2.064 29.99 20.09 1.49 23.14 1.30icecast-2.3.1 20.19 16.87 1.20 16.66 1.21sendmail-8.14.2 132.58 116.11 1.14 120.38 1.10wine-0.9.24 1820.88 >5 hrs >5 hrs >5 hrs >5 hrs

procedure pointers are resolved. For these two benchmarks, Fulcra is not scalable forthe same reason illustrated for 176.gcc in Figure 5.1.

6 Related Work

Context Sensitivity There are many pointer analyses in the literature with differenttypes of flow sensitivity assumed: some are flow-sensitive [6, 11, 15, 17, 19, 20, 45, 50, 57],some are inclusion-based [33, 30, 51] and some are unification-based [9, 12, 21, 27, 28].To account for the interprocedural side effects of a procedure context-sensitively, somepointer analyses resort to cloning [11, 30, 48] while others rely on procedure summarisation[6, 10, 19, 50, 57, 60]. While Binary Decision Diagrams can be used to handle efficientlythe exponential number of contexts by exploiting their similarities [3, 48, 60], cloning-based algorithms are still not scalable to large programs. When context sensitivity isconsidered, different types of precision are distinguished if calling contexts are identifiedby full call paths [19, 57], assumed aliases at callsites [6, 20], acyclic call paths (with theSCCs in the call graph being collapsed) [19, 57] and approximated call paths within theSCCs [11, 48, 50]. The research described in [21, 52] focuses on achieving scalability forcontext-sensitive heap cloning and is thus orthogonal to this research.

There are some earlier attempts [13, 32, 33, 30] on adding context sensitivity to An-dersen’s analysis to analyse C/C++ programs. Cloning [30] achieves context sensitivity

26

trivially by analysing different calls to a procedure using different clones of the procedure.Fulcra [32, 33], which improves [13], clones only the side-effect-causing constraints, i.e.,so-called critical edges in a procedure. The analysis introduced in [59] is demand-drivenrather than a whole-program analysis.

Icon is more powerful than a 1-callsite-sensitive analysis and less so than a cloning-based context-sensitive analysis when recursion cycles are collapsed in all cases (Theo-rem 2). Therefore, a good balance between efficiency and precision is maintained to makeIcon practical for compilers.

Parameterisation To distinguish the side effects of a procedure context-sensitively atits different callsites, symbolic names have been used to represent points-to relations passedfrom its callers into the procedure, such as invisible blocks [11, 20], auxiliary parameters[28], extended parameters [50] and semi-parameterised spaces [57]. Some important bene-fits include improved precision (due to more strong updates enabled) if flow sensitivity isconsidered [50, 57] and faster analysis (by propagating one symbolic name instead of allindividual locations abstracted). In [28], its parameterisation is performed on a unification-based analysis. In [11, 50], when flow sensitivity is considered, some aliased symbolic namesof a procedure are either allowed or merged, trading precision for efficiency. In [49], theanalysis proposed for Java uses parameterised points-to escape graphs to identify memoryblocks escaping from a method.

In this paper, we consider inclusion-based analysis without strong updates. The points-to values passed into a procedure are fully parameterised in terms of (symbolic) virtualvariables. To the best of our knowledge, this is the first paper exploiting fully parame-terised pointer information on inclusion-based pointer analysis, resulting in (1) faster con-vergence facilitated by parameterised side-effect summarisation and (2) call-path-basedcontext-sensitivity in the special case when the formal parameters of every procedure arealias-free (Theorem 2).

Side-Effect Summarisation There are two earlier summary-based approaches to achiev-ing context sensitivity, by using RCI (Relevant Context Inference) [6] and PTFs (PartialTransfer Functions) [50], which were both originally proposed for flow-sensitive analysis.In the case of inclusion-based analysis, both are not as efficient as Icon. RCI builds onetransfer function for a procedure eagerly by assuming the presence of all possible aliasesat its entry, which is unnecessarily costly as revealed by (lack of) the aliasing relations inreal code given in Table 5.4. On the other hand, building multiple PTFs for a procedurelazily based on different aliasing combinations at its callsites is also unnecessarily costly,as this would require multiple constraint graphs to be constructed for the procedure. Byexploiting the absence of strong updates, Icon is designed to perform its analysis quicklyby summarising a procedure iteratively in-place, i.e., in the same constraint graph of theprocedure.

Pre-Analysis An analysis can be bootstrapped with the results of a prior analysis thatis faster but less precise [19, 17]. In [57], Steensgaard’s analysis is performed first toassign a level to each variable. Then the program is re-analysed level by level for greaterprecision. In [44], Andersen’s analysis is used to accelerate static detection of memory leaksfor C/C++ programs. In this paper, Icon is boosted by must-not-aliased information toaccelerate its convergence.

27

7 Conclusion

We introduce a new context-sensitive pointer analysis on top of Andersen’s analysis thatcan scale to millions of lines of C/C++ code, making it deployable in modern optimizingcompilers to drive advanced compiler optimisations. We have validated its scalabilityin Open64 using a total of 21 C/C++ programs (totalling 2.7 MLOC), by comparingit with the state-of-the art. By summarising the points-to side effects of a procedureparametrically using virtual variables, our analysis is significantly faster than the state-of-the-art while yielding nearly the same precision.

Bibliography

[1] M. Acharya and B. Robinson. Practical change impact analysis based on static pro-gram slicing for industrial software systems. In International Conference on SoftwareEngineering, pages 746–755, 2011.

[2] L. O. Andersen. Program analysis and specialization for the C programming language.PhD thesis, DIKU, University of Copenhagen, May 1994. (DIKU report 94/19).

[3] M. Berndl, O. Lhotak, F. Qian, L. Hendren, and N. Umanee. Points-to analysisusing BDDs. In ACM SIGPLAN conference on Programming Language Design andImplementation, volume 38, pages 103–114, 2003.

[4] Q. Cai and J. Xue. Optimal and efficient speculation-based partial redundancy elim-ination. In International Symposium on Code Generation and Optimization, pages91–104, 2003.

[5] G. J. Chaitin. Register allocation & spilling via graph coloring. In SIGPLAN’82:Proceedings of the SIGPLAN Symposium on Compiler Construction, pages 98–101,1982.

[6] R. Chatterjee, B. Ryder, and W. Landi. Relevant context inference. In ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages, page 146,1999.

[7] S. Cherem, L. Princehouse, and R. Rugina. Practical memory leak detection usingguarded value-flow analysis. In ACM SIGPLAN conference on Programming Lan-guage Design and Implementation, pages 480–491, 2007.

[8] F. Chow, S. Chan, S. Liu, R. Lo, and M. Streich. Effective representation of aliases andindirect memory operations in SSA form. In International Conference on CompilerConstruction, pages 253–267, 1996.

[9] M. Das, B. Liblit, M. Fahndrich, and J. Rehof. Estimating the impact of scalablepointer analysis on optimization. Static Analysis Symposium, pages 260–278, 2001.

[10] I. Dillig, T. Dillig, A. Aiken, and M. Sagiv. Precise and compact modular proce-dure summaries for heap manipulating programs. In ACM SIGPLAN conference onProgramming Language Design and Implementation, pages 567–577, 2011.

[11] M. Emami, R. Ghiya, and L. Hendren. Context-sensitive interprocedural points-to analysis in the presence of function pointers. In ACM SIGPLAN conference onProgramming Language Design and Implementation, pages 242–256, 1994.

28

[12] M. Fahndrich, J. Rehof, and M. Das. Scalable context-sensitive flow analysis usinginstantiation constraints. In ACM SIGPLAN conference on Programming LanguageDesign and Implementation, volume 35, pages 253–263, 2000.

[13] J. S. Foster, M. Fahndrich, and A. Aiken. Polymorphic versus monomorphic flow-insensitive points-to analysis for C. In Static Analysis Symposium, pages 175–198,2000.

[14] L. Gao, L. Li, J. Xue, and P.-C. Yew. Seed: A statically greedy and dynamicallyadaptive approach for speculative loop execution. IEEE Transactions on Computers,62(5):1004–1016, 2013.

[15] B. Hackett and A. Aiken. How is aliasing used in systems software? In ACM SIG-SOFT Symposium on the Foundations of Software Engineering, pages 69–80, 2006.

[16] B. Hardekopf and C. Lin. The ant and the grasshopper: fast and accurate pointeranalysis for millions of lines of code. In ACM SIGPLAN conference on ProgrammingLanguage Design and Implementation, pages 290–299, 2007.

[17] B. Hardekopf and C. Lin. Flow-sensitive pointer analysis for millions of lines of code.In International Symposium on Code Generation and Optimization, pages 289–298,2011.

[18] N. Heintze and O. Tardieu. Ultra-fast aliasing analysis using CLA: A million lines ofC code in a second. In ACM SIGPLAN conference on Programming Language Designand Implementation, volume 36, page 263, 2001.

[19] V. Kahlon. Bootstrapping: a technique for scalable flow and context-sensitive pointeralias analysis. In ACM SIGPLAN conference on Programming Language Design andImplementation, pages 249–259, 2008.

[20] W. Landi and B. Ryder. A safe approximate algorithm for interprocedural aliasing. InACM SIGPLAN conference on Programming Language Design and Implementation,volume 27, page 248, 1992.

[21] C. Lattner, A. Lenharth, and V. Adve. Making context-sensitive points-to analysiswith heap cloning practical for the real world. In ACM SIGPLAN conference onProgramming Language Design and Implementation, volume 42, page 289, 2007.

[22] O. Lhotak and L. Hendren. Context-sensitive points-to analysis: is it worth it? InInternational Conference on Compiler Construction, pages 47–64, 2006.

[23] L. Li, C. Cifuentes, and N. Keynes. Boosting the performance of flow-sensitive points-to analysis using value flow. In ACM SIGSOFT Symposium on the Foundations ofSoftware Engineering, pages 343–353, 2011.

[24] L. Li, C. Cifuentes, and N. Keynes. Precise and scalable context-sensitive pointeranalysis via value flow graph. In International Symposium on Memory Management,2013.

[25] L. Li, H. Feng, and J. Xue. Compiler-directed scratchpad memory management viagraph coloring. ACM Transactions on Architecture and Code Optimization, 6(3),2009.

29

[26] L. Li, L. Gao, and J. Xue. Memory coloring: A compiler approach for scratchpadmemory management. In International Conference on Parallel Architectures andCompilation Techniques, pages 329–338, 2005.

[27] D. Liang and M. Harrold. Efficient points-to analysis for whole-program analysis.In ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages199–215, 1999.

[28] D. Liang and M. Harrold. Efficient computation of parameterized pointer informationfor interprocedural analyses. In Static Analysis Symposium, pages 279–298, 2001.

[29] Y. Lu, L. Shang, X. Xie, and J. Xue. An incremental points-to analysis with CFL-reachability. In International Conference on Compiler Construction, pages 61–81,2013.

[30] R. Nasre and R.Govindarajan. Prioritizing constraint evaluation for efficient points-toanalysis. In International Symposium on Code Generation and Optimization, 2011.

[31] P. H. Nguyen and J. Xue. Interprocedural side-effect analysis and optimisation inthe presence of dynamic class loading. In Australasian Computer Science Conference,pages 9–18, 2005.

[32] E. Nystrom. FULCRA pointer analysis framework. PhD thesis, University of Illinoisat Urbana-Champaign, 2006.

[33] E. Nystrom, H. Kim, and W. Hwu. Bottom-up and top-down context-sensitivesummary-based pointer analysis. In Static Analysis Symposium, pages 165–180, 2004.

[34] D. Pearce, P. Kelly, and C. Hankin. Efficient field-sensitive pointer analysis of C.volume 30, pages 4–es, 2007.

[35] F. Pereira and D. Berlin. Wave propagation and deep propagation for pointer analysis.In International Symposium on Code Generation and Optimization, pages 126–135,2009.

[36] C. G. Quinones, C. Madrile, J. Sanchez, P. Marcuello, A. Gonzalez, and D. M.Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In ACM SIGPLAN conference on Programming Language Designand Implementation, pages 269–279.

[37] B. R. Rau. Iterative modulo scheduling: an algorithm for software pipelining loops.In International symposium on Microarchitecture, pages 63–74, New York, NY, USA,1994.

[38] L. Shang, Y. Lu, and J. Xue. Fast and precise points-to analysis with incrementalCFL-reachability summarisation: preliminary experience. In IEEE/ACM Interna-tional Conference on Automated Software Engineering, pages 270–273, 2012.

[39] L. Shang, X. Xie, and J. Xue. On-demand dynamic summary-based points-to analysis.In International Symposium on Code Generation and Optimization, 2012.

[40] Y. Smaragdakis, M. Bravenboer, and O. Lhotak. Pick your contexts well: under-standing object-sensitivity. In ACM SIGPLAN-SIGACT Symposium on Principlesof Programming Languages, pages 17–30, 2011.

30

[41] M. Sridharan and R. Bodık. Refinement-based context-sensitive points-to analysisfor Java. In ACM SIGPLAN conference on Programming Language Design and Im-plementation, pages 387–400, 2006.

[42] B. Steensgaard. Points-to analysis in almost linear time. In ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages, pages 32–41, 1996.

[43] Y. Sui, Y. Li, and J. Xue. Query-directed adaptive heap cloning for optimizingcompilers. In International Symposium on Code Generation and Optimization, 2013.

[44] Y. Sui, D. Ye, and J. Xue. Static memory leak detection using full-sparse value-flow analysis. In International Symposium on Software Testing and Analysis, pages254–264, 2012.

[45] Y. Sui, S. Ye, J. Xue, and P.-C. Yew. SPAS: scalable path-sensitive pointer analysison full-sparse SSA. In Asian Symposium on Programming Languages and Systems,pages 155–171, Berlin, Heidelberg, 2011.

[46] Q. Sun, J. Zhao, and Y. Chen. Probabilistic points-to analysis for java. In Interna-tional Conference on Compiler Construction, pages 62–81, 2011.

[47] N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August.Speculative decoupled software pipelining. In International Conference on ParallelArchitectures and Compilation Techniques, pages 49–59, 2007.

[48] J. Whaley and M. Lam. Cloning-based context-sensitive pointer alias analysis usingbinary decision diagrams. In ACM SIGPLAN conference on Programming LanguageDesign and Implementation, pages 131–144, 2004.

[49] J. Whaley and M. Rinard. Compositional pointer and escape analysis for Java pro-grams. In ACM Conference on Object-Oriented Programming, Systems, Languages,and Applications, pages 187–206, 1999.

[50] R. Wilson and M. Lam. Efficient context-sensitive pointer analysis for C programs. InACM SIGPLAN conference on Programming Language Design and Implementation,volume 30, page 12, 1995.

[51] X. Xiao and C. Zhang. Geometric encoding: forging the high performance contextsensitive points-to analysis for Java. In International Symposium on Software Testingand Analysis, pages 188–198, 2011.

[52] G. Xu and A. Rountev. Merging equivalent contexts for scalable heap-cloning-basedcontext-sensitive points-to analysis. In International Symposium on Software Testingand Analysis, pages 225–236, 2008.

[53] J. Xue and J. Knoop. A fresh look at partial redundancy elimination as a maximumflow problem. 26(2), 2006.

[54] J. Xue and P. H. Nguyen. Completeness analysis for incomplete object-orientedprograms. In International Conference on Compiler Construction, pages 271–286,2005.

[55] J. Xue, P. H. Nguyen, and J. Potter. Interprocedural side-effect analysis for incompleteobject-oriented software modules. Journal of Systems and Software, 80(1):92–105,2007.

31

[56] X. Yang, L. Wang, J. Xue, Y. Deng, and Y. Zhang. Comparability graph coloring foroptimizing utilization of stream register files in stream processors. In ACM SIGPLANSymposium on Principles and Practice of Parallel Programming, pages 111–120, 2009.

[57] H. Yu, J. Xue, W. Huo, X. Feng, and Z. Zhang. Level by level: making flow-andcontext-sensitive pointer analysis scalable for millions of lines of code. In InternationalSymposium on Code Generation and Optimization, pages 218–229, 2010.

[58] Z.-H.Du, C.-C. Lim, X.-F. Li, C. Yang, Q. Zhao, and T. F. Ngai. A cost-drivencompilation framework for speculative parallelization of sequential programs. In ACMSIGPLAN conference on Programming Language Design and Implementation, pages71–81, 2004.

[59] X. Zheng and R. Rugina. Demand-driven alias analysis for C. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 197–208, 2008.

[60] J. Zhu and S. Calman. Symbolic pointer analysis revisited. In ACM SIGPLANconference on Programming Language Design and Implementation, page 157, 2004.

32

making context-sensitive inclusion-based pointer analysis ...reports/papers/201315.pdfmaking...

Documents