scalable points-to analysis. rupesh nasre. advisor: prof. r. govindarajan. comprehensive...

37
Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009.

Upload: mckayla-callahan

Post on 14-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Scalable Points-to Analysis.

Rupesh Nasre.

Advisor: Prof. R. Govindarajan.

Comprehensive Examination.Jun 22, 2009.

Page 2: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Outline. Introduction (points-to analysis).

Issues involved in context-sensitive analyses.

Bloom filter.

Points-to analysis with bloom filter.

Experimental evaluation.

Future work.

Page 3: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

What is Pointer Analysis?

Pointer analysis is the mechanism of statically finding out possible run-time values of a pointer

andrelation of various pointers with each other.

Page 4: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Why Pointer Analysis? for parallelization:

fun(p);

fun(q);

for common subexpression elimination:

x = p + 2;

y = q + 2;

for dead code elimination.

if (p == q) {

fun();

}

for other optimizations.

Page 5: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Introduction. Flow sensitivity. Context sensitivity. Field sensitivity. Unification based. Inclusion based.

Page 6: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Flow sensitivity.

p = &x;

p = &y;

label:

...

flow-sensitive, at label: {(p, y)}.

flow-insensitive: {(p, x), (p, y)}.

Page 7: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Context sensitivity.caller1() { caller2() { fun(int *ptr) {

fun(&x); fun(&y); a = ptr;

} } }

context-insensitive: {(a, x), (a, y)}.

context-sensitive: {(a, x)} along call-path caller1,

{(a, y)} along call-path caller2.

Page 8: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Field sensitivity.a.f = &x;

field-sensitive: {(a.f, x)}.

field-insensitive: {(a, x)}.

Page 9: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Unification based.one(&s1); one(struct s*p) { two(struct s*q) {

one(&s2); p->a = 3; q->b = 4;

two(&s3); two(p); }

}

unification-based: {(p, &s1), (p, &s2), (p, &s3),

(q, &s1), (q, &s2), (q, &s3)}.

Page 10: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Inclusion based.one(&s1); one(struct s*p) { two(struct s*q) {

one(&s2); p->a = 3; q->b = 4;

two(&s3); two(p); }

}

inclusion-based: {(p, &s1), (p, &s2),

(q, &s1), (q, &s2), (q, &s3)}

Page 11: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Related work.Scalable points-to analyses. B. Steensgaard, Points-to Analysis in Almost Linear Time, POPL

1996.

J. Whaley and M. S. Lam, Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams, PLDI 2004.

B. Hardekopf and C. Lin, The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code, PLDI 2007.

V. Kahlon, Bootstrapping: a technique for scalable flow and context-sensitive pointer alias analysis, PLDI 2008.

Page 12: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Issues with context-sensitivity.main() { f(a) { g(b) {

S1: f(&x); S3: g(a); ...

S2: f(&y); S4: g(z); ...

} } }

f f

g g g g

main

S1 S2

S3 S4 S3 S4

Invocation graph.

Exponential number of contexts.

Page 13: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Issues with context-sensitivity.Storage requirement increases exponentially.

Along S1-S3-S5-S7, p points to {x1, x3, x5, x7}.

Along S1-S3-S5-S8, p points to {x1, x3, x5, x8}.

Along S1-S3-S6-S7, p points to {x1, x3, x6, x7}.

Along S1-S3-S6-S8, p points to {x1, x3, x6, x8}.

Along S1-S4-S5-S7, p points to {x1, x4, x5, x7}.

Along S1-S4-S5-S8, p points to {x1, x4, x5, x8}.

Along S1-S4-S6-S7, p points to {x1, x4, x6, x7}.

Along S1-S4-S6-S8, p points to {x1, x4, x6, x8}.

Along S2...

Page 14: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Tackling scalability issues.How about not storing complete contexts?

How about storing approximate points-to information?

Can we have a probabilistic data structure that approximates the storage?

Can we control the false-positive rate?

Page 15: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Bloom filter.

A bloom filter is a probabilistic data structure for membership queries, and is typically implemented as a fixed-sized array of bits.

To store elements e1, e2, e3, bits at positions hash(e1), hash(e2) and hash(e3) are set.

1 1

e1, e3 e2

Page 16: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Points-to analysis with Bloom filter.

A constraint is an abstract representation of the

pointer instruction. p = &x p.pointsTo(x).

p = q p.copyFrom(q).

*p = q p.storeThrough(q).

p = *q p.loadFrom(q).

Function arguments and return values resemble

p = q type of statement.

Note, each constraint also stores the context.

Page 17: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Points-to Analysis with Bloom filter.

If points-to pairs (p, x) are kept in bloom filter, existential queries like “does p point to x?” can be answered.

What about queries like “do p and q alias?”?

What about context-sensitive queries like “do p and q alias in context c?”?

How to process assignment statements p = q?

How about load/store statements *p = q and q = *p?

Page 18: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Multi-Bloom filter.

Points-to pairs are kept in a bloom filter per pointer. A bit set to 1 represents a points-to pair.

Example (diagram on the next slide):

Points-to pairs {(p, x), (p, y), (q, x)}.

hash(x) = 5, hash(y) = 6.

Set bit numbers: p.bucket[5], p.bucket[6], q.bucket[5].

Can hash(x) and hash(y) be the same? Yes.

Page 19: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Multi-Bloom filter.

0 0 0 0 0 1 1 00 5 6

(p, x) (p,y)

0 0 0 0 0 1 0

0 5

(q, x)

p.bucket.

q.bucket.

0 0

0 0

Page 20: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Multi-Bloom filter.

Each pointer has a fixed number of bits for storing its points-to information, called as a bucket.

Thus, if bucket size == 10, all pointees are hashed to a value from 0 to 9.

This notion is extended to have multiple buckets for each pointer for multiple hash functions.

Page 21: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Handling p = q.

Points-to set of q should be added to the points-to set of p.

Bitwise-OR each bucket of q with the corresponding bucket of p.

Example on the next slide.

Page 22: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Example.

h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3.

Page 23: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Handling p = *q and *p = q.

Extend multi-bloom to have another dimension for pointers pointed to by pointers.

The idea can be extended to higher-level pointers (***p, ****p, and so on).

We implemented it only for two-level pointers.

Example on the next slide.

Page 24: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Another example.

h(x) = 1, h(y) = 4, hs(p1) = 1, hs(p2) = 2.

Page 25: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Alias query: context-sensitive.

If the query is DoAlias(p, q, c),

for each hash function ii {

hasPointee = false;

for each bucket-bit jj

if (p.bucket[c][ii][jj] and q.bucket[c][ii][jj])

hasPointee = true;

if (hasPointee == false)

return NoAlias;

}

return MayAlias;

Page 26: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Alias query: context-insensitive.

If the query is DoAlias(p, q),

for each context c {

if (DoAlias(p, q, c) == MayAlias)

return MayAlias;

}

return NoAlias;

Page 27: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Experimental evaluation: Time.

exact. (40-10-4) (80-10-8) (400-100-12) (800-100-16)

gcc OOM. 791.705 3250.627 10237.702 27291.303perlbmk OOM. 76.277 235.207 2632.044 5429.385vortex OOM. 95.934 296.995 1998.501 4950.321eon 231.166 39.138 118.947 1241.602 2639.796parser 55.359 9.469 31.166 145.777 353.382gap 144.181 5.444 17.469 152.102 419.392vpr 29.702 5.104 18.085 88.826 211.065crafty 20.469 2.636 9.069 46.899 109.115mesa 1.472 1.384 2.632 10.041 23.721ammp 1.120 1.008 2.592 15.185 38.018twolf 0.596 0.656 1.152 5.132 12.433gzip 0.348 0.192 0.372 1.808 4.372bzip2 0.148 0.144 0.284 1.348 3.288mcf 0.112 0.332 0.820 5.036 12.677equake 0.224 0.104 0.236 1.104 2.652art 0.168 0.164 0.408 2.404 6.132httpd 17.445 7.180 15.277 52.793 127.503sendmail 5.956 3.772 6.272 25.346 65.889

Page 28: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Experimental evaluation: Memory.

exact. (40-10-4) (80-10-8) (400-100-12) (800-100-16)gcc OOM. 3955.39 15444.90 113576.00 302117.00perlbmk OOM. 1880.87 7344.33 54007.70 143662.00vortex OOM. 817.89 3193.65 23485.00 62471.00eon 385283.89 1722.32 6725.23 49455.00 131552.00parser 121587.5 564.19 2203.01 16200.20 43093.10gap 97862.67 1106.97 4322.47 31785.90 84551.70vpr 50209.5 309.98 1210.40 8900.88 23676.60crafty 15985.59 142.59 556.78 4094.37 10891.20mesa 8260.22 720.95 2815.14 20701.60 55066.90ammp 5843.27 200.09 781.30 5745.38 15282.90twolf 1593.69 440.73 1720.93 12655.20 33663.10gzip 1446.47 41.96 163.84 1204.79 3204.79bzip2 518.88 30.56 119.31 877.37 2333.82mcf 219.57 49.20 192.13 1412.83 3758.17equake 160.13 52.00 203.03 1493.03 3971.50art 41.46 22.18 86.59 636.77 1693.82httpd 225512.89 3058.17 11941.40 87813.10 233586.00sendmail 197382.28 1672.88 6532.20 48035.60 127776.00

Page 29: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Experimental evaluation: Precision.

exact. (40-10-4) (80-10-8) (400-100-12) (800-100-16)gcc OOM 71.8 79.6 83.4 85.3perlbmk OOM 75.3 85.0 89.3 90.6vortex OOM 85.7 90.1 91.2 91.5eon 96.8 81.5 88.9 94.3 96.8parser 98.0 65.8 97.3 97.9 98.0gap 97.5 88.2 93.5 96.7 97.4vpr 94.2 85.9 93.9 94.1 94.2crafty 97.6 97.1 97.6 97.6 97.6mesa 99.4 89.6 96.6 99.1 99.4ammp 99.2 98.4 99.0 99.2 99.2twolf 99.3 96.7 99.1 99.3 99.3gzip 90.9 88.8 90.5 90.8 90.9bzip2 88.0 84.8 88.0 88.0 88.0mcf 94.5 91.3 94.3 94.5 94.5equake 97.7 96.9 97.7 97.7 97.7art 88.6 86.6 88.4 88.6 88.6httpd 93.2 90.1 92.1 92.9 93.2sendmail 90.4 85.6 88.2 90.3 90.4

Page 30: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Summary.

Bloom filters offer an effective way to represent points-to information.

Precision can be as close to exact, still saving storage.

Parameters can be configured for an application usage.

Page 31: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Future work.

Flow-sensitive analysis using counting bloom filter.

need to support kill operation.

require resetting bits.

may introduce false negatives.

storage requirement of flow-sensitive analysis is an

issue.

Page 32: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Future work.

Adaptive bloom filter parameters.

not all pointers require same number of bits.

bits saved from one pointer can be used by another.

storage required for counters representing number of

bits for each pointer.

bitwise operations are not straightforward.

Page 33: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Future work.

Efficient flow-insensitive analysis.

approach similar to wave/deep propagation¹.

similarity with flow-sensitive analysis.

preliminary results show that the number of iterations to

reach a fix-point can be reduced, e.g., on an example

set of programs, total number of iterations are reduced

from 180 (deep) to 148.

¹ F M Q Periera, Daniel Berlin, Wave Propagation and Deep Propagation for Pointer Analysis, CGO 2009.

Page 34: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Scalable Points-to Analysis.

Rupesh Nasre.

Advisor: Prof. R. Govindarajan.

Comprehensive Examination.Jun 22, 2009.

Page 35: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Approach. Start from main().

Add constraints for each pointer statement.

Flow-insensitive.

Jump to the called function, process it and return. Continue with the caller. A function called multiple times is processed multiple times.

Keep context along with each constraint.

Recursion is handled by iterating over the cycle until a fix-point. This is context-insensitive.

At the end, iterate over constraints to extract points-to information in context-sensitive manner. Iteration makes it flow-insensitive.

Page 36: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Example.

main() { f(a) {

S1:r1 = f(p) return a

S2:r2 = f(q) }

S3:r3 = g(p) g(b) {

S4:r4 = h() c = b

} }

h() {

p = &x

q = &y

}

Only realizable paths.

Along main-S1, r1 points to x.

Along main-S2, r2 points to y.

Even though main-S3 and main-S4 are

different contexts, we merge context-information. Thus, c

points to x.Since main-S1 and main-S2 call the same function, we do not merge the information. Thus, r1 does not alias

with r2.

Page 37: Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009

Experimental evaluation: Mod/Ref.

exact. (40-10-4) (80-10-8) (400-100-12) (800-100-16)

gcc OOM 19.5 24.6 28.9 29.5perlbmk OOM 16.3 24.7 28 29.1vortex OOM 25.9 35.7 38.4 39.1eon 41.2 38.1 39.8 40.9 41.2parser 29.9 26.7 28.3 28.8 28.9gap 51.1 48.4 49.8 50.7 51vpr 73.5 71.4 72.9 73.4 73.5crafty 93.4 93.2 93.4 93.4 93.4mesa 25.6 22.9 24.3 25.2 25.6ammp 57.9 56.3 57.5 57.8 57.9twolf 69.1 67.3 68.6 68.9 69.1gzip 30 29.8 29.9 30 30bzip2 44.7 44.5 44.7 44.7 44.7mcf 51.3 50.8 51.3 51.3 51.3equake 95.2 93.7 94.9 95.2 95.2art 64.8 64.4 64.7 64.8 64.8httpd 52.9 50.6 51.8 52.8 52.9sendmail 43.3 40.8 42 43.1 43.3