sound and precise analysis of parallel programs through schedule specialization

36
Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University 1

Upload: jag

Post on 22-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Sound and Precise Analysis of Parallel Programs through Schedule Specialization. Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University. Motivation. Analyzing parallel programs is difficult. . precision. Total Schedules. Dynamic Analysis. Analyzed Schedules. ?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Sound and Precise Analysis ofParallel Programs through

Schedule Specialization

Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng YangColumbia University

1

Page 2: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

2

Motivation

soundness (# of analyzed schedules / # of total schedules)

precision Total Schedules

AnalyzedSchedules

StaticAnalysis

DynamicAnalysis

AnalyzedSchedules

?

• Analyzing parallel programs is difficult.

Page 3: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

3

• Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime.

Schedule Specialization

soundness (# of analyzed schedules / # of total schedules)

precision Total Schedules

StaticAnalysis

DynamicAnalysis

AnalyzedSchedules

EnforcedSchedules

ScheduleSpecialization

Page 4: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

4

Enforcing Schedules Using Peregrine

• Deterministic multithreading– e.g. DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet

(ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11)

– Performance overhead• e.g. Kendo: 16%, Tern & Peregrine: 39.1%

• Peregrine– Record schedules, and reuse them on a wide range of

inputs.– Represent schedules explicitly.

Page 5: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

5

• Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime.

Schedule Specialization

soundness (# of analyzed schedules / # of total schedules)

precision

StaticAnalysis

DynamicAnalysis

AnalyzedSchedules

EnforcedSchedulesSchedule

Specialization

Page 6: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

6

Framework

• Extract control flow and data flow enforced by a set of schedules

Schedule

ScheduleSpecializationProgram

C/C++ programwith Pthread

Total order ofsynchronizations

SpecializedProgram

Extra def-usechains

Page 7: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

7

Outline

• Example• Control-Flow Specialization• Data-Flow Specialization• Results• Conclusion

Page 8: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Running Example

int results[p_max];int global_id = 0;

int main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

void *worker(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

8

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlocklock

unlock

Race-free?

Page 9: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

9

create

create

join

join

atoi

++i

create

return

i = 0

i < p

++i

join

i < p

i = 0

Page 10: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

10

create

create

join

join

atoi

++i

create

return

i = 0

i < p

++i

join

i < p

i = 0

atoi

create

i = 0

i < p

Page 11: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

11

create

create

join

join

atoi

++i

create

return

i = 0

i < p

++i

join

i < p

i = 0

create

atoi

i = 0

i < p

create

++i

create

i < p

Page 12: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Control-Flow Specializationint main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0;}

12

create

create

join

join

atoi

++i

create

return

i = 0

i < p

++i

join

i < p

i = 0

atoi

create

i = 0

i < p

++i

create

i < p

++i

i < p

join

i < p

i = 0

++i

join

i < p

++i

i < p

return

Page 13: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

13

Control-Flow Specialized Program

int main(int argc, char *argv[]) { int i; int p = atoi(argv[1]); i = 0; // i < p == true pthread_create(&child[i], 0, worker.clone1, 0); ++i; // i < p == true pthread_create(&child[i], 0, worker.clone2, 0); ++i; // i < p == false i = 0; // i < p == true pthread_join(child[i], 0); ++i; // i < p == true pthread_join(child[i], 0); ++i; // i < p == false return 0;}

atoi

create

i = 0

i < p

++i

create

i < p

++i

i < p

join

i < p

i = 0

++i

join

i < p

++i

i < p

return

Page 14: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

14

More Challenges onControl-Flow Specialization

• Ambiguity

call

Caller Callee

call

S1

• A schedule has too many synchronizations

ret

S2

Page 15: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

15

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = global_idglobal_id++

my_id = global_idglobal_id++

Page 16: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

16

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = global_idglobal_id++

my_id = global_idglobal_id++

Page 17: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

17

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = 0global_id = 1

my_id = global_idglobal_id++

Page 18: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); int my_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0;}

18

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = 0global_id = 1

my_id = 1global_id = 2

Page 19: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Data-Flow Specialization

int global_id = 0;

void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 1; pthread_mutex_unlock(&global_id_lock); results[0] = compute(0); return 0;}

void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 2; pthread_mutex_unlock(&global_id_lock); results[1] = compute(1); return 0;}

19

Thread 0 Thread 1 Thread 2

create

create

join

join

lock

unlock

lock

unlock

global_id = 0

my_id = 0global_id = 1

my_id = 1global_id = 2

Page 20: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

20

More Challenges onData-Flow Specialization

• Must/May alias analysis– global_id

• Reasoning about integers– results[0] = compute(0)– results[1] = compute(1)

• Many def-use chains

Page 21: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

21

Evaluation

• Applications– Static race detector– Alias analyzer– Path slicer

• Programs– PBZip2 1.1.5– aget 0.4.1– 8 programs in SPLASH2– 7 programs in PARSEC

Page 22: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

22

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

Page 23: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

23

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

Page 24: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

24

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

Page 25: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

25

Program Original Specialized

aget 72 0

PBZip2 125 0

fft 96 0

blackscholes 3 0

swaptions 165 0

streamcluster 4 0

canneal 21 0

bodytrack 4 0

ferret 6 0

raytrace 215 0

cholesky 31 7

radix 53 14

water-spatial 2447 1799

lu-contig 18 18

barnes 370 369

water-nsquared 354 333

ocean 331 292

StaticRaceDetector

# of FalsePositives

Page 26: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

26

Static Race Detector: Harmful Races Detected

• 4 in aget• 2 in radix• 1 in fft

Page 27: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

27

Precision of Schedule-AwareAlias Analysis

Page 28: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

28

Precision of Schedule-AwareAlias Analysis

Page 29: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

29

Precision of Schedule-AwareAlias Analysis

Page 30: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

30

Conclusion and Future Work

• Designed and implemented schedule specialization framework– Analyzes the program over a small set of schedules– Enforces these schedules at runtime

• Built and evaluated three applications– Easy to use– Precise

• Future work– More applications– Similar specialization ideas on sequential programs

Page 31: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

31

Related Work• Program analysis for parallel programs

– Chord (PLDI ’06), RADAR (PLDI ’08), FastTrack (PLDI ’09)• Slicing

– Horgon (PLDI ’90), Bouncer (SOSP ’07), Jhala (PLDI ’05), Weiser (PhD thesis), Zhang (PLDI ’04)

• Deterministic multithreading– DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet (ASPLOS ’10),

Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11)• Program specialization

– Consel (POPL ’93), Gluck (ISPL ’95), Jørgensen (POPL ’92), Nirkhe (POPL ’92), Reps (PDSPE ’96)

Page 32: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

32

Backup Slides

Page 33: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

33

Specialization Time

Page 34: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

34

Handling Races

• We do not assume data-race freedom. • We could if our only goal is optimization.

Page 35: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

35

Input Coverage

• Use runtime verification for the inputs not covered

• A small set of schedules can cover a wide range of inputs

Page 36: Sound and Precise Analysis of Parallel Programs through Schedule Specialization

36