the ant and the grasshopper fast and accurate pointer analysis for millions of lines of code

111
The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code Ben Hardekopf and Calvin Lin PLDI 2007 (Best Paper & Best Presentation Award)

Upload: jalen

Post on 22-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code. Ben Hardekopf and Calvin Lin PLDI 2007 (Best Paper & Best Presentation Award). Contributions of the Paper. Identify current state-of-the-art Compare 3 well-known algorithms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

The Ant and The GrasshopperFast and Accurate Pointer Analysis for Millions

of Lines of Code

Ben Hardekopf and Calvin Lin

PLDI 2007

(Best Paper & Best Presentation Award)

Page 2: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Contributions of the Paper

Identify current state-of-the-art Compare 3 well-known algorithms Fastest takes ½ hour to analyze 1M lines of C

Advance the state-of-the-art Two techniques(The Ant and The

Grasshopper) for inclusion-based analysis Over 3x faster, same precision Will be incorporated into GCC

Page 3: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

The Agenda

Background Lazy Cycle Detection Hybrid Cycle Detection Evaluation Q & A

Page 4: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Why Pointer Analysis ?

Pointer information - vital for most program analyses like program verification and program understanding.

Precise pointer analysis is NP hard.

The most precise analyses are flow sensitive and context sensitive, but do not scale to large programs.

Page 5: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Pointer Analysis - Simplified

• Flow-sensitive analysis computes a different graph at each program point. But this can be quite expensive.

• Solution: Flow-insensitive analysis - compute a points-to relation which is the least upper bound of all the points-to relations computed by the flow-sensitive analysis

• Compute a SINGLE points-to relation that holds regardless of the order in which assignment statements are actually executed

• “consider all the assignment statements together, replacing strong updates in dataflow equations with weak updates”

Lets see some equations!

Page 6: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Dataflow Equations – Flow-Sensitive

x := &yG

G’ = G with pt’(x) {y}

x := yG

G’ = G with pt’(x) pt(y)

x := *yG

G’ = G with pt’(x) U pt(a) for all a in pt(y)

*x := yG

G’ = G with pt’(a) U pt(y) for all a in pt(x)

strong updates weak update

Page 7: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Dataflow Equations – Flow-Insensitive

Statements

x := &yG

G = G with pt(x) U {y}

x := yG

G = G with pt(x) U pt(y)

x := *yG

G = G with pt(x) U pt(a) for all a in pt(y)

*x := yG

G = G with pt(a) U pt(y) for all a in pt(x)

weak updates only

Page 8: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Set Constraints

Statements

)(xpty

x := &y

x := y

x := *y

*x := y

)()( yptxpt )()().( yptaptxpta

)()().( aptxptypta

Base Simple

Complex1 Complex2

Page 9: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

• Generate constraints from the code• Build a constraint graph

• Nodes: variables• Edges: inclusion constraints

• Add indirect constraints • Pointer dereference• Recursively compute transitive closure of

the graph

Page 10: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c = &f; e = &c; g = &a; a = d; b = a; d = *e;*e = b;*g = e;

Page 11: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis c = &f; e = &c; g = &a; a = d; b = a; d = *e;*e = b;*g = e;

c {f} e {c} g {a} a d b a d *e *e b *g e

Page 12: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c = &f; e = &c; g = &a; a = d; b = a; d = *e;*e = b;*g = e;

c {f} e {c} g {a} a d b a d *e *e b *g e

Page 13: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c = &f; e = &c; g = &a; a = d; b = a; d = *e;*e = b;*g = e;

c {f} e {c} g {a} a d b a d *e *e b *g e

Page 14: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c = &f; e = &c; g = &a; a = d; b = a; d = *e;*e = b;*g = e;

c {f} e {c} g {a} a d b a d *e *e b *g e

Page 15: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

Page 16: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

e

g

f

d

a

c

b

Constraint Graph

Page 17: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

e

g

f

d

a

b

Constraint Graph

cf

Page 18: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

d

a

cf

b

Constraint Graph

Page 19: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

d

a

cf

b

Constraint Graph

Page 20: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

Page 21: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

Page 22: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

Page 23: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

Page 24: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

Page 25: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

Page 26: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bdf

a

cf

Constraint Graph

Page 27: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bdf

af

cf

Constraint Graph

Page 28: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 29: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 30: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 31: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 32: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 33: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 34: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 35: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 36: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 37: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af,c

cf

Constraint Graph

Page 38: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf,c

df

af,c

cf

Constraint Graph

Page 39: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf,c

df

af,c

cf,c

Constraint Graph

Page 40: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf,c

df,c

af,c

cf,c

Constraint Graph

Page 41: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Inclusion-based Analysis

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf,c

df,c

af,c

cf,c

Constraint Graph

Page 42: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Online Cycle Detection

Inclusion-based analysis is O(n3)

Optimize with online cycle detection All nodes in the same cycle will have identical

points-to sets Most cycles appear during the analysis as

new edges are added

Page 43: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Background: Online Cycle Detection

Cycle detection mechanism will largely determine performance of analysis

Must carefully balance aggression versus overhead Too aggressive → too much graph traversal Too conservative → cycles found too late

Page 44: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Contributions

Two new techniques for cycle detection Lazy Cycle Detection Hybrid Cycle Detection

Techniques are complementary Hybrid Cycle Detection can be composed

with any other cycle detection technique

Page 45: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Contributions

Two new techniques for cycle detection Lazy Cycle Detection Hybrid Cycle Detection

Techniques are complementary Hybrid Cycle Detection can be composed

with any other cycle detection technique

Page 46: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

Well-known fact: A cycle forces nodes to have identical

points-to sets Key Insight: Nodes with identical points-to sets

indicate possible cycles Balance aggression and overhead by waiting

for the effect of the cycle (identical points-to sets) to become obvious

Page 47: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

Page 48: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

Page 49: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

Page 50: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

Page 51: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bdf

a

cf

Constraint Graph

Page 52: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bdf

af

cf

Constraint Graph

Page 53: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 54: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 55: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 56: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 57: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 58: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 59: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 60: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bf

df

af

cf

Constraint Graph

Page 61: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

Page 62: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

Page 63: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

Page 64: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

Page 65: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f,c

Constraint Graph

Page 66: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f,c

Constraint Graph

Page 67: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection

IS LAZY because cycles are detected only while propagating constraints to them (well after they are created)

Nodes with identical points-to sets MAY NOT be part of a cycle

An Additional heuristic is needed to stop this wasteful search : Don’t trigger cycles detection on the same edge twice

=> Cycle detection is not guaranteed to find all cycles

Page 68: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Contributions

Two new techniques for cycle detection Lazy Cycle Detection Hybrid Cycle Detection

Techniques are complementary Hybrid Cycle Detection can be composed

with any other cycle detection technique

Page 69: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detectionfinds few cycles finds many cycles

cheap

expensive

OfflineCycle Detection

Rountev and Chandra, Off-line Variable Substitution for Scaling Points-toAnalysis, in PLDI 2000.

Page 70: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detectionfinds few cycles finds many cycles

cheap

expensive

OfflineCycle Detection

OnlineCycle Detection

Fähndrich et al, Partial Online Cycle Elimination in Inclusion ConstraintGraphs, in PLDI 1998.

Page 71: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection

Key Insight: combining offline and online techniques can give us the best of both worldsPS: Offline – before actual constraint graph traversal

finds few cycles finds many cycles

cheap

expensive

OfflineCycle Detection

OnlineCycle Detection

HybridCycle Detection

Page 72: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection

Eagerly finds cycles without traversing constraint graph

Not guaranteed to find all cycles 46‒74% in these benchmarks

Can be combined with other cycle detection techniques

Page 73: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection

Offline component

Online component

Page 74: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Offline

• Linear time static analysis prior to actual pointer analysis.

• Uses a simpler offline constraint graph • a node for each variable• a ref node for variable dereference• an edge for each simple, complex constr. • ignore base constraints

• Detect cycles in this graph using Tarjan’s Algorithm

• Key: No need to perform transitive closure here!

Page 75: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Offline

c {f} e {c} g {a} a d b a d *e *e b *g e

Page 76: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Offline

c {f} e {c} g {a} a d b a d *e *e b *g e

Ignore Base Constraints!

Page 77: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Offline

c {f} e {c} g {a} a d b a d *e *e b *g e

Page 78: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Offline

c {f} e {c} g {a} a d b a d *e *e b *g e

e

*g d

a

*e

b

Offline Constraint Graph

Page 79: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Offline

c {f} e {c} g {a} a d b a d *e *e b *g e

e

*g d

a

*e

b

Offline Constraint Graph

Page 80: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Offline

c {f} e {c} g {a} a d b a d *e *e b *g e

e

*g d

a

*e

b

Offline Constraint Graph

Page 81: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Offline

c {f} e {c} g {a} a d b a d *e *e b *g e

e

*g d

a

*e

b

Offline Constraint Graph

Page 82: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Offline

c {f} e {c} g {a} a d b a d *e *e b *g e

e

*g d

a

*e

b

Offline Constraint Graph

Page 83: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Offline

c {f} e {c} g {a} a d b a d *e *e b *g e

e

*g d

a

*e

b

e → {a,b,d}

Offline Constraint Graph

Page 84: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection

Offline component

Online component

Page 85: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

e → {a,b,d}

Page 86: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

e → {a,b,d}

Page 87: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

bd

a

cf

Constraint Graph

e → {a,b,d}

Page 88: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

e → {a,b,d}

Page 89: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

e → {a,b,d}

Page 90: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

e → {a,b,d}

Page 91: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

e → {a,b,d}

Page 92: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

e → {a,b,d}

Page 93: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

e → {a,b,d}

Page 94: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

e → {a,b,d}

Page 95: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

e → {a,b,d}

Page 96: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

e → {a,b,d}

Page 97: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f

Constraint Graph

e → {a,b,d}

Page 98: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f,c

Constraint Graph

e → {a,b,d}

Page 99: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Hybrid Cycle Detection ‒ Online

c {f} e {c} g {a} a d b a d *e *e b *g e

ec

ga

f

a/b/c/d

f,c

Constraint Graph

e → {a,b,d}

Page 100: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

EvaluationCompare against 3 well-known algorithms

• Heintze and Tardieu [PLDI'01]• Pearce et al [PASTE'04]• Berndl et al [PLDI'03]

All algorithms compute the exact same solution

Six benchmarks: 100K—2M LOC • Emacs, Ghostscript, Gimp, Insight, Wine, Linux

Kernel

Page 101: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Berndl et al Pearce et al Heintze et al LCD+HCD0

2

4

6

8

10

12

14

16

18Normalized Speedup

EmacsGhostscript GimpInsightWineLinux

Performance Comparison

59

Page 102: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Berndl et al Pearce et al Heintze et al LCD+HCD0

2

4

6

8

10

12

14

16

18Normalized Speedup

EmacsGhostscript GimpInsightWineLinux

Performance Comparison

59

Page 103: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Concluding Remarks• 20x faster than Berndl et al, 6x faster

than Pearce et al and 3x faster than Heintze et al

• The paper also looks at different data structures to efficiently represent points-to sets

• Sparse-bitmap (used in GCC)• Binary Decision Trees (BDD)

• BDD implementation is 2x slower on average, but used 5.5X less memory

• Question: They DO NOT compare with IDEAL. Is there further opportunity here ?

Page 104: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Backup: Transitive Closure Algorithm

Page 105: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Lazy Cycle Detection Algorithm

Page 106: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Heintze and Tardieu [PLDI'01]

• Online algorithm • During construction, as new inclusion edges are added

to the graph, the transitive edges are NOT added• During analysis, indirect constraints are resolved

through REACHABILITY queries. • => lots of redundant queries

• In their paper, they reported results for field based implementation. When fields are expanded, the algorithm is dramatically slow.

Page 107: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Pearce et al [PASTE'04]

• Two variants

• First:• Maintain topological order of the graph• A newly inserted edge that violates this ordering COULD

create a cycle, so check whenever this happens.

• Second• Periodic sweep of the constraint graph to detect and collapse

cycles.

Page 108: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Berndl et al [PLDI'03]

• Field sensitive inclusion-based pointer analysis for JAVA programs

• Uses BDDs to represent graph and points-to sets

• This paper extends this algorithm by • Making it field insensitive • Handle indirect function calls

Page 109: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Benchmarks

Constraints reduced using offline variable substitution(60-77%)

Page 110: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Memory Consumption

Page 111: The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code

Observations

• Number of nodes collapsedo Reduces nodes and edges in graph

• Number of nodes searched in DFSo Overhead due to cycle detection

• Number of points-to info propogationo Expensive operation