analyses and optimizations for multithreaded programs

110
Analyses and Optimizations for Multithreaded Programs Martin Rinard, Alex Salcianu, Brian Demsky MIT Laboratory for Computer Science John Whaley IBM Tokyo Research Laboratory

Upload: clint

Post on 24-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Analyses and Optimizations for Multithreaded Programs. John Whaley IBM Tokyo Research Laboratory. Martin Rinard, Alex Salcianu, Brian Demsky MIT Laboratory for Computer Science. Motivation. Threads are Ubiquitous Parallel Programming for Performance Manage Multiple Connections - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Analyses and Optimizations for Multithreaded Programs

Analyses and Optimizations for Multithreaded ProgramsMartin Rinard, Alex Salcianu,Brian Demsky

MIT Laboratory for Computer Science

John Whaley IBM Tokyo Research Laboratory

Page 2: Analyses and Optimizations for Multithreaded Programs

Motivation• Threads are Ubiquitous

• Parallel Programming for Performance• Manage Multiple Connections• System Structuring Mechanism

• Overhead• Thread Management• Synchronization

• Opportunities• Improved Memory Management

Page 3: Analyses and Optimizations for Multithreaded Programs

What This Talk is About• New Abstraction: Parallel Interaction Graph

• Points-To Information• Reachability and Escape Information • Interaction Information

•Caller-Callee Interactions•Starter-Startee Interactions

• Action Ordering Information• Analysis Algorithm• Analysis Uses (synchronization elimination,

stack allocation, per-thread heap allocation)

Page 4: Analyses and Optimizations for Multithreaded Programs

Outline• Example• Analysis Representation and Algorithm• Lightweight Threads• Results• Conclusion

Page 5: Analyses and Optimizations for Multithreaded Programs

Sum Sequence of Numbers9 8 1 5 3 7 2 6

Page 6: Analyses and Optimizations for Multithreaded Programs

Group in Subsequences9 8 1 5 3 7 2 6

Page 7: Analyses and Optimizations for Multithreaded Programs

Sum Subsequences (in Parallel)9 8 1 5 3 7 2 6

+

6

+

17

+

10

+

8

Page 8: Analyses and Optimizations for Multithreaded Programs

Add Sums Into Accumulator9 8 1 5 3 7 2 6

+

6

+

17

+

10

+

8

Accumulator0

Page 9: Analyses and Optimizations for Multithreaded Programs

Add Sums Into Accumulator9 8 1 5 3 7 2 6

+

6

+

17

+

10

+

8

Accumulator17

Page 10: Analyses and Optimizations for Multithreaded Programs

Add Sums Into Accumulator9 8 1 5 3 7 2 6

+

6

+

17

+

10

+

8

Accumulator23

Page 11: Analyses and Optimizations for Multithreaded Programs

Add Sums Into Accumulator9 8 1 5 3 7 2 6

+

6

+

17

+

10

+

8

Accumulator33

Page 12: Analyses and Optimizations for Multithreaded Programs

Add Sums Into Accumulator9 8 1 5 3 7 2 6

+

6

+

17

+

10

+

8

Accumulator41

Page 13: Analyses and Optimizations for Multithreaded Programs

Common Schema• Set of tasks• Chunk tasks to increase granularity• Tasks have both

• Independent computation• Updates to shared data

Page 14: Analyses and Optimizations for Multithreaded Programs

Realization in Javaclass Accumulator { int value = 0; synchronized void add(int v) { value += v; }}

Page 15: Analyses and Optimizations for Multithreaded Programs

Realization in Javaclass Task extends Thread { Vector work; Accumulator dest; Task(Vector w, Accumulator d) { work = w; dest = d; }

public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); }}

0work dest

Task

62

AccumulatorVector

Page 16: Analyses and Optimizations for Multithreaded Programs

Realization in Javaclass Task extends Thread { Vector work; Accumulator dest; Task(Vector w, Accumulator d) { work = w; dest = d; }

public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); }}

0work dest

Task

62

AccumulatorVector

Enumeration

Page 17: Analyses and Optimizations for Multithreaded Programs

Realization in Javavoid generateTask(int l, int u, Accumulator a) { Vector v = new Vector(); for (int j = l; j < u; j++) v.addElement(new Integer(j)); Task t = new Task(v,a); t.start();}void generate(int n, int m, Accumulator a) { for (int i = 0; i < n; i ++) generateTask(i*m, i*(m+1),

a);}

Page 18: Analyses and Optimizations for Multithreaded Programs

Accumulator0

Task Generation

Page 19: Analyses and Optimizations for Multithreaded Programs

AccumulatorVector

0

Task Generation

Page 20: Analyses and Optimizations for Multithreaded Programs

AccumulatorVector

0

Task Generation

2

Page 21: Analyses and Optimizations for Multithreaded Programs

62

AccumulatorVector

0

Task Generation

Page 22: Analyses and Optimizations for Multithreaded Programs

work destTask

62

AccumulatorVector

0

Task Generation

Page 23: Analyses and Optimizations for Multithreaded Programs

work destTask

62

AccumulatorVector

0

98

Vector

Task Generation

Page 24: Analyses and Optimizations for Multithreaded Programs

work destTask

62

AccumulatorVector

0

workdest

Task

98

Vector

Task Generation

Page 25: Analyses and Optimizations for Multithreaded Programs

work destTask

62

AccumulatorVector

0

workdest

Task

98

Vector

work

dest

Task

51

Vector

Task Generation

Page 26: Analyses and Optimizations for Multithreaded Programs

Analysis

Page 27: Analyses and Optimizations for Multithreaded Programs

Analysis Overview• Interprocedural• Interthread • Flow-sensitive

• Statement ordering within thread• Action ordering between threads

• Compositional, Bottom Up• Explicitly Represent Potential Interactions

Between Analyzed and Unanalyzed Parts• Partial Program Analysis

Page 28: Analyses and Optimizations for Multithreaded Programs

Analysis Result for run Method

Accumulator

public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum);}•Abstraction: Points-to Graph

•Nodes Represent Objects•Edges Represent References

work destTask

Vector

Enumeration

this

Page 29: Analyses and Optimizations for Multithreaded Programs

Analysis Result for run Method

Accumulator

public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum);}•Inside Nodes

•Objects Created Within Current Analysis Scope

•One Inside Node Per Allocation Site

•Represents All Objects Created At That Site

work destTask

Vector

Enumeration

this

Page 30: Analyses and Optimizations for Multithreaded Programs

Analysis Result for run Method

Accumulator

public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum);}•Outside Nodes

•Objects Created Outside Current Analysis Scope

•Objects Accessed Via References Created Outside Current Analysis Scope

work destTask

Vector

Enumeration

this

Page 31: Analyses and Optimizations for Multithreaded Programs

Analysis Result for run Method

Accumulator

public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum);}•Outside Nodes

•One per Static Class Field •One per Parameter•One per Load Statement

• Represents Objects Loaded at That Statement

work destTask

Vector

Enumeration

this

Page 32: Analyses and Optimizations for Multithreaded Programs

Analysis Result for run Method

Accumulator

public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum);}•Inside Edges

•References Created Inside Current Analysis Scope

work destTask

Vector

Enumeration

this

Page 33: Analyses and Optimizations for Multithreaded Programs

Analysis Result for run Method

Accumulator

public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum);}•Outside Edges

•References Created Outside Current Analysis Scope

•Potential Interactions in Which Analyzed Part Reads Reference Created in Unanalyzed Part

work destTask

Vector

Enumeration

this

Page 34: Analyses and Optimizations for Multithreaded Programs

Concept of Escaped Node• Escaped Nodes Represent Objects

Accessible Outside Current Analysis Scope• parameter nodes, load nodes• static class field nodes• nodes passed to unanalyzed methods• nodes reachable from unanalyzed but

started threads• nodes reachable from escaped nodes

• Node is Captured if it is Not Escaped

Page 35: Analyses and Optimizations for Multithreaded Programs

Why Escaped Concept is Important

• Completeness of Analysis Information• Complete information for captured nodes• Potentially incomplete for escaped nodes

• Lifetime Implications• Captured nodes are inaccessible when

analyzed part of the program terminates• Memory Management Optimizations

•Stack allocation •Per-Thread Heap Allocation

Page 36: Analyses and Optimizations for Multithreaded Programs

Intrathread Dataflow Analysis• Computes a points-to escape graph for

each program point• Points-to escape graph is a pair <I,O,e>

• I - set of inside edges• O - set of outside edges• e - escape information for each node

Page 37: Analyses and Optimizations for Multithreaded Programs

Dataflow Analysis• Initial state:

I : formals point to parameter nodes,

classes point to class nodesO: Ø

• Transfer functions:I´ = (I – KillI ) U GenI

O´ = O U GenO

• Confluence operator is U

Page 38: Analyses and Optimizations for Multithreaded Programs

Intraprocedural Analysis• Must define transfer functions for:

• copy statement l = v• load statement l1 = l2.f• store statement l1.f = l2

• return statement return l• object creation site l = new cl• method invocation l = l0.op(l1…lk)

Page 39: Analyses and Optimizations for Multithreaded Programs

copy statement l = vKillI = edges(I, l)GenI = {l} × succ(I, v)I´ = (I – KillI ) U GenI

l

v

Existing edges

Page 40: Analyses and Optimizations for Multithreaded Programs

copy statement l = vKillI = edges(I, l)GenI = {l} × succ(I, v)I´ = (I – KillI ) U GenI

Generated edges

l

v

Page 41: Analyses and Optimizations for Multithreaded Programs

load statement l1 = l2.fSE = {n2 in succ(I, l2) . escaped(n2)}SI = U{succ(I, n2, f) . n2 in succ(I, l2)}case 1: l2 does not point to an escaped node (SE = Ø)

KillI = edges(I, l1)GenI = {l1} × SI

l1

l2

Existing edges

f

Page 42: Analyses and Optimizations for Multithreaded Programs

load statement l1 = l2.fSE = {n2 in succ(I, l2) . escaped(n2)}SI = U{succ(I, n2, f) . n2 in succ(I, l2)}case 1: l2 does not point to an escaped node (SE = Ø)

KillI = edges(I, l1)GenI = {l1} × SI

Generated edges

l1

l2

f

Page 43: Analyses and Optimizations for Multithreaded Programs

load statement l1 = l2.fcase 2: l2 does point to an escaped node (not SE = Ø)

KillI = edges(I, l1)GenI = {l1} × (SI U {n})GenO = (SE × {f}) × {n}where n is the load node for l1 = l2.f

l1

l2

Existing edges

Page 44: Analyses and Optimizations for Multithreaded Programs

load statement l1 = l2.fcase 2: l2 does point to an escaped node (not SE = Ø)

KillI = edges(I, l1)GenI = {l1} × (SI U {n})GenO = (SE × {f}) × {n}where n is the load node for l1 = l2.f

Generated edges

l1

l2

nf

Page 45: Analyses and Optimizations for Multithreaded Programs

store statement l1.f = l2

GenI = (succ(I, l1) × {f}) × succ(I, l2)I´ = I U GenI

l2

Existing edges

l1

Page 46: Analyses and Optimizations for Multithreaded Programs

store statement l1.f = l2

GenI = (succ(I, l1) × {f}) × succ(I, l2)I´ = I U GenI

Generated edges

l2

l1f

Page 47: Analyses and Optimizations for Multithreaded Programs

object creation site l = new clKillI = edges(I, l)GenI = {<l, n>}where n is inside node for l = new cl

l

Existing edges

Page 48: Analyses and Optimizations for Multithreaded Programs

object creation site l = new clKillI = edges(I, l)GenI = {<l, n>}where n is inside node for l = new cl

Generated edges

l n

Page 49: Analyses and Optimizations for Multithreaded Programs

Method Call• Analysis of a method call:

• Start with points-to escape graph before the call site

• Retrieve the points-to escape graph from analysis of callee

• Map outside nodes of callee graph to nodes of caller graph

• Combine callee graph into caller graph• Result is the points-to escape graph after

the call site

Page 50: Analyses and Optimizations for Multithreaded Programs

v

t

a

Points-to Escape Graphbefore call to

t = new Task(v,a)

Start With Graph Before Call

Page 51: Analyses and Optimizations for Multithreaded Programs

work

dest

v

t

a

this

w

d

Points-to Escape Graphbefore call to

t = new Task(v,a)

Points-to Escape Graphfrom analysis of

Task(w,d)

Retrieve Graph from Callee

Page 52: Analyses and Optimizations for Multithreaded Programs

work

dest

v

t

a

this

w

d

Points-to Escape Graphbefore call to

t = new Task(v,a)

Points-to Escape Graphfrom analysis of

Task(w,d)

Map Parameters from Callee to Caller

Page 53: Analyses and Optimizations for Multithreaded Programs

work

dest

v

t

a

this

w

d

Combined Graphafter call to

t = new Task(v,a)

Points-to Escape Graphfrom analysis of

Task(w,d)

Transfer Edges from Callee to Caller

work

dest

Page 54: Analyses and Optimizations for Multithreaded Programs

v

t

a

Combined Graphafter call to

t = new Task(v,a)

Discard Parameter Nodes from Callee

work

dest

Page 55: Analyses and Optimizations for Multithreaded Programs

Points-to Escape Graphbefore call to

x.foo()

Points-to Escape Graphfrom analysis of

foo()

thisx

More General Example

yz

Page 56: Analyses and Optimizations for Multithreaded Programs

Points-to Escape Graphbefore call to

x.foo()

Points-to Escape Graphfrom analysis of

foo()

thisx

Initialize MappingMap Formals to Actuals

yz

Page 57: Analyses and Optimizations for Multithreaded Programs

Points-to Escape Graphbefore call to

x.foo()

Points-to Escape Graphfrom analysis of

foo()

thisx

Extend MappingMatch Inside and Outside Edges

y

Mapping is UnidirectionalFrom Callee to Caller

z

Page 58: Analyses and Optimizations for Multithreaded Programs

Points-to Escape Graphbefore call to

x.foo()

Points-to Escape Graphfrom analysis of

foo()

thisx

Complete Mapping Automap Load and Inside Nodes Reachable

from Mapped Nodes

yz

Page 59: Analyses and Optimizations for Multithreaded Programs

Combined Graphafter call to

x.foo()

Points-to Escape Graphfrom analysis of

foo()

thisx

Combine MappingProject Edges from Callee Into Combined

Graph

yz

Page 60: Analyses and Optimizations for Multithreaded Programs

Combined Graphafter call to

x.foo()

x

Discard Callee Graph

z

Page 61: Analyses and Optimizations for Multithreaded Programs

Combined Graphafter call to

x.foo()

x

Discard Outside Edges From Captured Nodes

z

Page 62: Analyses and Optimizations for Multithreaded Programs

Interthread Analysis• Augment Analysis Representation

• Parallel Thread Set• Action Set (read,write,sync,create edge)• Action Ordering Information (relative to

thread start actions)• Thread Interaction Analysis

• Combine points-to graphs• Induces combination of other information

• Can perform interthread analysis at any point to improve precision of results

Page 63: Analyses and Optimizations for Multithreaded Programs

Points-to Escape Graphsometime after call to

x.start()

Points-to Escape Graphfrom analysis of

run()

Combining Points-to Graphs

x this

Page 64: Analyses and Optimizations for Multithreaded Programs

Points-to Escape Graphsometime after call to

x.start()

Points-to Escape Graphfrom analysis of

run()

Initialize MappingMap Startee Thread to Starter

Thread

x this

Page 65: Analyses and Optimizations for Multithreaded Programs

Points-to Escape Graphsometime after call to

x.start()

Points-to Escape Graphfrom analysis of

run()

Extend MappingMatch Inside and Outside Edges

x this

Page 66: Analyses and Optimizations for Multithreaded Programs

Points-to Escape Graphsometime after call to

x.start()

Points-to Escape Graphfrom analysis of

run()

Extend MappingMatch Inside and Outside Edges

x this

Page 67: Analyses and Optimizations for Multithreaded Programs

Points-to Escape Graphsometime after call to

x.start()

Points-to Escape Graphfrom analysis of

run()

Extend MappingMatch Inside and Outside Edges

x this

Mapping is BidirectionalFrom Startee to StarterFrom Starter to Startee

Page 68: Analyses and Optimizations for Multithreaded Programs

Points-to Escape Graphsometime after call to

x.start()

Points-to Escape Graphfrom analysis of

run()

Complete Mapping Automap Load and Inside Nodes Reachable from Mapped Nodes

x this

Page 69: Analyses and Optimizations for Multithreaded Programs

Combined Points-to Escape Graph sometime after call to

x.start()

Combine GraphsProject Edges Through Mappings Into

Combined Graph

x this

Page 70: Analyses and Optimizations for Multithreaded Programs

Combined Points-to Escape Graph sometime after call to

x.start()

Combine GraphsProject Edges Through Mappings Into

Combined Graph

x this

Page 71: Analyses and Optimizations for Multithreaded Programs

Combined Points-to Escape Graph sometime after call to

x.start()

Combine GraphsProject Edges Through Mappings Into

Combined Graph

x this

Page 72: Analyses and Optimizations for Multithreaded Programs

Combined Points-to Escape Graph sometime after call to

x.start()

Combine GraphsProject Edges Through Mappings Into

Combined Graph

x this

Page 73: Analyses and Optimizations for Multithreaded Programs

Combined Points-to Escape Graph sometime after call to

x.start()

Discard StarteeThread Node

x this

Page 74: Analyses and Optimizations for Multithreaded Programs

Combined Points-to Escape Graph sometime after call to

x.start()

Discard Startee Thread Node

x

Page 75: Analyses and Optimizations for Multithreaded Programs

Combined Points-to Escape Graph sometime after call to

x.start()

Discard Outside Edges From Captured Nodes

x

Page 76: Analyses and Optimizations for Multithreaded Programs

Life is not so Simple• Dependences between phases• Mapping best framed as constraint

satisfaction problem• Solved using constraint satisfaction

algorithm

Page 77: Analyses and Optimizations for Multithreaded Programs

Interthread Analysis With Actions and Ordering

Page 78: Analyses and Optimizations for Multithreaded Programs

Accumulatorb e

awork dest

Task

d

c

Vector

ta

ParallelThreads

Actions

wr awr bwr cwr d

sync brd b

Points-to Graph

Action Ordering

“All actionshappen before

thread a starts

executing”

Analysis Result for generateTask

Page 79: Analyses and Optimizations for Multithreaded Programs

6Enumeration

Accumulator2 5

1work dest

Task

4

3

Vector

this

ParallelThreads

Actions

rd 1rd 2rd 3rd 4

Action Ordering

noparallelthreads

none

rd 5

wr 5

sync 2

rd 6

wr 6

Points-to Graph

Analysis Result for run

sync 5edge(1,2)edge(1,5)edge(2,3)edge(3,4)

Page 80: Analyses and Optimizations for Multithreaded Programs

Role of edge(1,2) Actions• One edge action for each outside edge• Action order for edge actions improves

precision of interthread analysis• If starter thread reads a reference

before startee thread is started• Then reference was not created by

startee thread• Outside edge actions record order• Inside edges from startee matched only

against parallel outside edges

Page 81: Analyses and Optimizations for Multithreaded Programs

Points-to Escape Graphsometime after call to

x.start()

Points-to Escape Graphfrom analysis of

run()

Edge Actions in Combining Points-to Graphs

1

2

3

x this

Action Ordering

edge(1,2) || 1

Page 82: Analyses and Optimizations for Multithreaded Programs

Points-to Escape Graphsometime after call to

x.start()

Points-to Escape Graphfrom analysis of

run()

Edge Actions in Combining Points-to Graphs

1

2

3

x this

Action Ordering

(i.e., edge(1,2)created before

started)1

none

Page 83: Analyses and Optimizations for Multithreaded Programs

Accumulatorb e

awork dest

Task

d

c

Vector

t

ParallelThreads

Actions

wr awr bwr cwr d

sync brd b

Points-to Graph

Action Ordering

“All actions from

current threadhappen before

thread a starts

executing”

Analysis Result After Interaction

rd a, ard b, ard c, ard d, ard e, awr e, a

sync b, async e, a

a

Page 84: Analyses and Optimizations for Multithreaded Programs

Roles of Intrathread and Interthread Analyses

• Basic Analysis• Intrathread analysis delivers parallel

interaction graph at each program point•records parallel threads•does not compute thread interaction

• Choose program point (end of method)• Interthread analysis delivers additional

precision at that program point• Does not exploit ordering information from

thread join constructs

Page 85: Analyses and Optimizations for Multithreaded Programs

Join Ordering

t = new Task();t.start();

“computation that runs in parallel with task t”

t.join();

“computation that runs after task t”

t.run();“computation

from task t”

Page 86: Analyses and Optimizations for Multithreaded Programs

Exploiting Join Ordering• At join point

• Interthread analysis delivers new (more precise) parallel interaction graph

• Intrathread analysis uses new graph• No parallel interactions between

• Thread• Computation after join

Page 87: Analyses and Optimizations for Multithreaded Programs

Extensions• Partial program analysis

• can analyze method independent of callers

• can analyze method independent of methods it invokes

• can incrementally analyze callees to improve precision

• Dial down precision to improve efficiency• Demand-driven formulations

Page 88: Analyses and Optimizations for Multithreaded Programs

Key Ideas• Explicitly represent potential

interactions between analyzed and unanalyzed parts• Inside versus outside nodes and edges• Escaped versus captured nodes• Precisely bound ignorance

• Exploit ordering information• intrathread (flow sensitive)• interthread (starts, edge orders, joins)

Page 89: Analyses and Optimizations for Multithreaded Programs

Analysis Uses

Overheads in Standard Execution and How to Eliminate Them

Page 90: Analyses and Optimizations for Multithreaded Programs

6Enumeration

Accumulator2 5

1work dest

Task

4

3

Vector

this

Intrathread Analysis Result from End of run Method

•Enumeration object is captured•Does not escape to caller•Does not escape to parallel

threads•Lifetime of Enumeration object

is bounded by lifetime of run•Can allocate Enumeration

object on call stack instead of heap

Page 91: Analyses and Optimizations for Multithreaded Programs

Accumulatorb e

awork dest

Task

d

c

Vector

t

ParallelThreads

Actions

wr a

wr b

wr c

wr d

sync b

rd b

Points-to Graph

Action Ordering

“All actions from current thread happen before

thread a startsexecuting”rd a, a

rd b, a

rd c, a

rd d, a

rd e, a

wr e, a

sync b, a

sync e, a

a

•Vector object is captured•Multiple threads synchronize on

Vector object•But synchronizations from different

threads do not occur concurrently•Can eliminate synchronization on

Vector object

Interthread Analysis Result from End of generateTask Method

Page 92: Analyses and Optimizations for Multithreaded Programs

Accumulatorb e

awork dest

Task

d

c

Vector

t

ParallelThreads

Actions

wr a

wr b

wr c

wr d

sync b

rd b

Points-to Graph

Action Ordering

“All actions from current thread happen before

thread a startsexecuting”rd a, a

rd b, a

rd c, a

rd d, a

rd e, a

wr e, a

sync b, a

sync e, a

a

•Vectors, Tasks, Integers captured•Parent, child access objects•Parent completes accesses

before child starts accesses•Can allocate objects on child’s

per-thread heap

Interthread Analysis Result from End of generateTask Method

Page 93: Analyses and Optimizations for Multithreaded Programs

Thread Overhead• Inefficient Thread Implementations

• Thread Creation Overhead• Thread Management Overhead• Stack Overhead

• Use a more efficient thread implementation• User-level thread management• Per-thread heaps• Event-driven form

Page 94: Analyses and Optimizations for Multithreaded Programs

Standard Thread Implementation

return addressframe pointer

xy

return addressframe pointer

bc

a

•Call frames allocated on stack•Context Switch

• Save state on stack• Resume another thread

•One stack per thread

Page 95: Analyses and Optimizations for Multithreaded Programs

Standard Thread Implementation

return addressframe pointer

xy

return addressframe pointer

bc

a

save area

•Call frames allocated on stack•Context Switch

• Save state on stack• Resume another thread

•One stack per thread

Page 96: Analyses and Optimizations for Multithreaded Programs

Event-Driven Formreturn addressframe pointer

xy

return addressframe pointer

bc

a

•Call frames allocated on stack•Context Switch

• Build continuation on heap• Copy out live variables• Return out of computation• Resume another continuation

•One stack per processor

c

x

resumemethod

resumemethod

Page 97: Analyses and Optimizations for Multithreaded Programs

Complications• Standard thread models use blocking

I/O• Automatically convert blocking I/O to

asynchronous I/O• Scheduler manages interleaving of

thread executions• Stack Allocatable Objects May Be Live

Across Blocking Calls• Transfer allocation to per-thread heap

Page 98: Analyses and Optimizations for Multithreaded Programs

Opportunity• On a uniprocessor, compiler controls

placement of context switch points• If program does not hold lock across

blocking call, can eliminate lock

Page 99: Analyses and Optimizations for Multithreaded Programs

Experimental Results• MIT Flex Compiler System

• Static Compiler• Native code for StrongARM

• Server Benchmarks • http, phone, echo, time

• Scientific Computing Benchmarks• water, barnes

Page 100: Analyses and Optimizations for Multithreaded Programs

Server Benchmark Characteristics

IR Size

(instrs)

Number of

Methods

PreAnalysis

Time (secs)

echo 4,639 131 28

time 4,573 136 29

http 10,643 292 103

phone 9,547 267 75

IntraThreadAnalysis

Time (secs)

InterThreadAnalysis

Time (secs)

74

70

199

191

73

74

269

256

Page 101: Analyses and Optimizations for Multithreaded Programs

Percentage of Eliminated Synchronization Operations

0

20

40

60

80

100

http phone time echo mtrt

Intrathread only

Interthread

Page 102: Analyses and Optimizations for Multithreaded Programs

Compilation Options for Performance Results

• Standard• kernel threads, synch included

• Event-Driven• event-driven, no synch at all

• +Per-Thread Heap• event-driven, no synch at all, per-

thread heap allocation

Page 103: Analyses and Optimizations for Multithreaded Programs

Throughput (Responses per Second)

StandardEvent-Driven+Per-Thread

Heap

echo time http2K

http20K

0

100

200

300

400

phone

Page 104: Analyses and Optimizations for Multithreaded Programs

water 25,583 335 1156

IR Size(instrs)

Number ofMethods

Total AnalysisTime (secs)

barnes 19,764 364 491

380

Pre AnalysisTime (secs)

129

Scientific Benchmark Characteristics

Page 105: Analyses and Optimizations for Multithreaded Programs

Compiler Options0: Sequential C++1: Baseline - Kernel Threads2: Lightweight Threads3: Lightweight Threads + Stack Allocation4: Lightweight Threads + Stack Allocation

- Synchronization

Page 106: Analyses and Optimizations for Multithreaded Programs

0

0.2

0.4

0.6

0.8

1

Baseline +Light +Stack -Synch

Execution Times

Proportion of Sequential C++ Execution Timewater small water barnes

Page 107: Analyses and Optimizations for Multithreaded Programs

Related Work• Pointer Analysis for Sequential

Programs• Chatterjee, Ryder, Landi (POPL 99)• Sathyanathan & Lam (LCPC 96)• Steensgaard (POPL 96)• Wilson & Lam (PLDI 95)• Emami, Ghiya, Hendren (PLDI 94)• Choi, Burke, Carini (POPL 93)

Page 108: Analyses and Optimizations for Multithreaded Programs

Related Work• Pointer Analysis for Multithreaded Programs

• Rugina and Rinard (PLDI 99) (fork-join parallelism, not compositional)

• We have extended our points-to analysis for multithreaded programs (irregular, thread-based concurrency, compositional)

• Escape Analysis• Blanchet (POPL 98)• Deutsch (POPL 90, POPL 97)• Park & Goldberg (PLDI 92)

Page 109: Analyses and Optimizations for Multithreaded Programs

Related Work• Synchronization Optimizations

• Diniz & Rinard (LCPC 96, POPL 97)• Plevyak, Zhang, Chien (POPL 95)• Aldrich, Chambers, Sirer, Eggers (SAS99)• Blanchet (OOPSLA 99)• Bogda, Hoelzle (OOPSLA 99)• Choi, Gupta, Serrano, Sreedhar, Midkiff

(OOPSLA 99)• Ruf (PLDI 00)

Page 110: Analyses and Optimizations for Multithreaded Programs

Conclusion• New Analysis Algorithm

• Flow-sensitive, compositional• Multithreaded programs• Explicitly represent interactions between

analyzed and unanalyzed parts• Analysis Uses

• Synchronization elimination• Stack allocation• Per-thread heap allocation

• Lightweight Threads