determinis)c+parallel+algorithms+ and+programming+guyb/papers/ec2-11.pdf · parallel deterministic...

49
Determinis)c Parallel Algorithms and Programming Guy Blelloch Carnegie Mellon University EC2, 2011 1

Upload: others

Post on 08-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Determinis)c  Parallel  Algorithms  and  Programming  

Guy  Blelloch  Carnegie  Mellon  University  

EC2, 2011 1

Page 2: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Parallelism  vs.  Concurrency  

Concurrency

sequential concurrent

Parallelism serial Traditional

programming Traditional OS

parallel Deterministic parallelism

General parallelism

EC2, 2011 2

!   Parallelism: using multiple processors/cores running at the same time. Property of the machine

!   Concurrency: non-determinism due to interleaving threads. Needed for some “interactive” applications.

Page 3: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Determinis)c  parallel  algorithms/programs  have  great  proper)es,  but  how  prac)cal  are  they?  

Outline:  

1.  What  are  the  nice  proper)es  

2.  How  general  are  they  3.  Recent  results  on  performance  of  

determinis)c  algorithms  

EC2, 2011 3

Page 4: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Concurrency  :  Stack  Example  1  struct link {int v; link* next;}

struct stack { link* headPtr;

void push(link* a) { a->next = headPtr; headPtr = a; }

link* pop() { link* h = headPtr; if (headPtr != NULL) headPtr = headPtr->next; return h;} }

EC2, 2011 4

H  

A  

H  

A  

Page 5: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Concurrency  :  Stack  Example  1  struct link {int v; link* next;}

struct stack { link* headPtr;

void push(link* a) { a->next = headPtr; headPtr = a; }

link* pop() { link* h = headPtr; if (headPtr != NULL) headPtr = headPtr->next; return h;} }

EC2, 2011 5

H  

A  

B  

Page 6: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Concurrency  :  Stack  Example  1  struct link {int v; link* next;}

struct stack { link* headPtr;

void push(link* a) { a->next = headPtr; headPtr = a; }

link* pop() { link* h = headPtr; if (headPtr != NULL) headPtr = headPtr->next; return h;} }

EC2, 2011 6

H  

A  

B  

Page 7: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Concurrency  :  Stack  Example  1  struct link {int v; link* next;}

struct stack { link* headPtr;

void push(link* a) { a->next = headPtr; headPtr = a; }

link* pop() { link* h = headPtr; if (headPtr != NULL) headPtr = headPtr->next; return h;} }

EC2, 2011 7

H  

A  

B  

Page 8: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Concurrency  :  Stack  Example  2  struct stack { link* headPtr;

void push(link* a) { do { link* h = headPtr; a->next = h; while (!CAS(&headPtr, h, a)); }

link* pop() { do { link* h = headPtr; if (h == NULL) return NULL; link* nxt = h->next; while (!CAS(&headPtr, h, nxt))} return h;} } EC2, 2011 8

H  

A  

Page 9: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Concurrency  :  Stack  Example  2  struct stack { link* headPtr;

void push(link* a) { do { link* h = headPtr; a->next = h; while (!CAS(&headPtr, h, a)); }

link* pop() { do { link* h = headPtr; if (h == NULL) return NULL; link* nxt = h->next; while (!CAS(&headPtr, h, nxt))} return h;} } EC2, 2011 9

H  

A  

Page 10: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Concurrency  :  Stack  Example  2  struct stack { link* headPtr;

void push(link* a) { do { link* h = headPtr; a->next = h; while (!CAS(&headPtr, h, a)); }

link* pop() { do { link* h = headPtr; if (h == NULL) return NULL; link* nxt = h->next; while (!CAS(&headPtr, h, nxt))} return h;} } EC2, 2011 10

H  

A  

B  

Page 11: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Concurrency  :  Stack  Example  2  struct stack { link* headPtr;

void push(link* a) { do { link* h = headPtr; a->next = h; while (!CAS(&headPtr, h, a)); }

link* pop() { do { link* h = headPtr; if (h == NULL) return NULL; link* nxt = h->next; while (!CAS(&headPtr, h, nxt))} return h;} } EC2, 2011 11

H  

A  

B  

Page 12: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Concurrency  :  Stack  Example  2’  P1 : x = s.pop(); y = s.pop(); s.push(x);

P2 : z = s.pop();

EC2, 2011 12

The  ABA  problem  

Can  be  fixed  with  counter  and  2CAS,  but…  

A   B   C  

B   C  

Before:

After: P2: h = headPtr; P2: nxt = h->next; P1: everything P2: CAS(&headPtr,h,nxt)

Page 13: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Concurrency  :  Stack  Example  3  struct link {int v; link* next;}

struct stack { link* headPtr;

void push(link* a) { atomic { a->next = headPtr; headPtr = a; }}

link* pop() { atomic { link* h = headPtr; if (headPtr != NULL) headPtr = headPtr->next; return h;}} }

EC2, 2011 13

Page 14: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Concurrency  :  Stack  Example  3’  void swapTop(stack s) { link* x = s.pop(); link* y = s.pop(); push(x); push(y);

}

Queues  are  trickier  than  stacks.  

EC2, 2011 14

Page 15: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Race  Free  ≠  Determinis)c  Transac)ons  ≠  Determinis)c  

Linearizability  ≠  Determinis)c  

EC2, 2011 15

Page 16: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

EC2, 2011 16

Parallel  Example:  Quicksort  

function quicksort(S) = if (#S <= 1) then S else let a = S[rand(#S)]; S1 = {e in S | e < a}; S2 = {e in S | e = a}; S3 = {e in S | e > a}; R = {quicksort(v) : v in [S1, S3]}; in R[0] ++ S2 ++ R[1];

{ … } – means  available  parallelism    Dynamically  scheduled  (e.g.  work  stealing)    Reasoning  on  correctness  is  no  different  than  what  we  

already  teach.  

Page 17: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Advantages  of  Determinism  

1.  Easier  to  understand  and  reason  about  code  2.  Composable  

3.  Easier  to  write  asser)ons  4.  Easier  to  formally  verify  (avoids  exponen)al  

search  across  interleavings)  5.  Easier  to  debug  (avoid  Heizenbugs)  6.  Easier  to  understand  performance  

7.  Easier  for  hardware/compiler  to  op)mize  EC2, 2011 17

Page 18: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

How  to  Get  Determinis)c  Parallelism  

What  is  the  most  general  model?    

EC2, 2011 18

Page 19: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

How  to  Get  Determinis)c  Parallelism  

What  is  the  most  general  model?  Strictly  synchronous  (data-­‐parallel)?  

Purely  func)onal?  

EC2, 2011 19

Page 20: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

How  to  Get  Determinis)c  Parallelism  

What  is  the  most  general  model?  Strictly  synchronous  (data-­‐parallel)?  

Purely  func)onal?  

It  is  Undecidable  to  determine  if  a  program  returns  the  same  result  independent  of  interleaving.  

EC2, 2011 20

Page 21: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

How  to  Get  Determinis)c  Parallelism  

What  is  the  most  general  model?  Undecidable    But,  here  is  the  most  general  defini)on  I  know  of:  

1.  Arbitrary  spawns  by  a  thread  of  any  number  of  child  threads.  

2.  Termina)on  of  threads  at  any  point.  

3.  Synchroniza)on  among  threads  only  through  condi,on-­‐variables  

4.  All  concurrent  opera)ons  on  state  commute  

EC2, 2011 21

Page 22: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Condi)on  Variables  

cond* x = new cond;

         Create  a  condi)on  variable  (initally  “clear”)  x->signal();

         Set  the  signal,  all  wai)ng  threads  can  proceed.    Must    only  be  called  at  most  once.      Needs  to  be  verified.  

x->wait();

         Wait  un)l  the  signal  in  set.  

These  define  a  dependence  graph  on  program  instruc)ons  with  cross  edges  from  a  signal  on  x  to  each  of  the  waits  on  x.  

EC2, 2011 22

Page 23: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Nested  Parallelism  

Condi)on  variables  are  hard  to  work  with,  so  

We  consider  nested  parallel  computa,ons    arbitrary  nes)ng  of  fork-­‐join  and  parallel  loops  

Has  some  important  advantages:  –  Good  for  caching  –  Reduces  scheduling  overhead  –  Supported  by  many  languages  –  Easy  to  analyze  costs  – Makes  it  easier  to  verify  code??  

EC2, 2011 23

Page 24: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Commu)ng  Opera)ons  (roughly)  

Opera)ons  E1  and  E2  are  concurrent  if  no  path  in  the  dependence  graph  between  them.  

Let  E(M)  -­‐>  M’  be  an  opera)on  that  transforms  the  state  from  M  to  M’.  

E1  and  E2  commute  with  respect  to  state  M  if                            E1(E2(M))  =  E2(E1(M))  

Any  concurrent  opera)ons  must  commute  with  respect  to  all  states.    Needs  to  be  verified  

EC2, 2011 24

Page 25: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Commu)ng  Opera)ons  (roughly)  

Examples:  

   read  (or  any  non  modifying  query  on  a  data  structure)      writeMin,  writeAndAdd      insert  into  an  ordered  dic)onary  

   delete  from  a  dic)onary  but  insert  does  not  commute  with  delete  

   uniqueLabel  

EC2, 2011 25

Page 26: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Our  Experiments  

We  coded  up  16  benchmark  problems  using  nested  parallelism  and  commuta)ve  opera)ons  (determinis)c).  

Trying  to  answer  the  ques)on:  can  one  get  good  efficiency  with  determinis)c  parallelism.  

EC2, 2011 26

Page 27: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Preliminary  Benchmarks  I  

EC2, 2011 27

Generic   Comparison  Sor)ng  

Removing  Duplicates  

Dic)onary  

Graphs   Breadth  First  Search  

Graph  Separators  

Minimum  Spanning  Tree  

Maximal  Independent  Set  

Geometry/Graphics  

Delaunay  Triangula)on  and  Refinement  

Convex  Hulls  

Ray  Cas)ng    

Page 28: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Preliminary  Benchmarks  II  

EC2, 2011 28

Machine  Learning  

All  Nearest  Neighbors  

Support  Vector  Machines  *  

K-­‐Means  *  

Text  Processing  

Suffix  Arrays  

Edit  Distance  

String  Search  

Science   Nbody  

Phylogene)c  tree  *  

Numerical   Sparse  Matrix  Vector  Mul)ply  

Sparse  Linear  Solve  *  

Page 29: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

EC2, 2011 29

Preliminary Numbers

Page 30: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Sort  Performance,  More  Detail  

weight   STL  Sort   Sanders  Sort   Quicksort   SampleSort   SampleSort  

     Cores   1   32   32   32   1  

Uniform   .1   15.8   1.06   4.22   .82   20.2  

Exponen)al   .1   10.8   .79   2.49   .53   13.8  

Almost  Sorted   .1   3.28   1.11   1.76   .27   5.67  

Trigram  Strings   .2   58.2   4.63   8.6   1.05   30.8  

Strings  Permuted   .2   82.5   7.08   28.4   1.76   49.3  

Structure   .3   17.6   2.03   6.73   1.18   26.7  

   Average   36.4   3.24   10.3   1.08   28.0  

EC2, 2011 30

All inputs are 100,000,000 long. All code written run on Cilk++ (also tested in Cilk+) All experiments on 32 core Nehalem (4 X x7560)

Page 31: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Main  Techniques  Mostly:  •  Func)onal  programming  style:  no  concurrent  updates    

But  also:  

•  Commuta)ve  updates  with  history  Independent  Data  Structures  

•  Priority  ordering  on  elements  –  i.e.  result  is  same  as  if  added  sequen)ally.    Makes  heavy  use  of  “write-­‐min”  atomic  opera)on.  

EC2, 2011 31

Page 32: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Easy  Cases    

Just  func)onal  programming  style  •  Sample  sort  

•  Nearest  Neighbors  •  Nbody  •  Convext  Hull  •  Sparse  MV  Product  

•  Suffix  Arrays  

•  Ray  Cas)ng  EC2, 2011 32

Page 33: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

3  Examples  of  Other  Cases  

•  Dic)onary  •  Breadth  First  Search  •  Delaunay  Triangula)on  and  Refinement  

EC2, 2011 33

Page 34: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Dic)onary  Using  hashing:  – Based  on  generic  hash  and  comparison  

– Problem:  representa)on  can  depend  on  ordering.  Also  on  which  redundant  element  is  kept.  

– Solu)on:  Use  history  independent  hash  table  based  on  linear  probing…representa)on  is  independent  of  order  of  inser)on  

– Use  write-­‐min  on  collision  

EC2, 2011 34

6   7   3   11   9   5   8  

7, 11 3 9 8, 5 6

Page 35: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Breadth  First  Search  (BFS)  

Goal:  generate  the  same  BFS  (spanning)  tree  as  the  sequen)al  Q  based  algorithm.  

EC2, 2011 35

Page 36: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Breadth  First  Search  (BFS)  

Sequen)al  algorithm:  

EC2, 2011 36

Page 37: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Breadth  First  Search  (BFS)  

Another  possible  tree:  

EC2, 2011 37

Page 38: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Breadth  First  Search  (BFS)  

Solu)on:  – Maintain  Fron)er  and  priority  order  it  

– Use  writeMin  to  choose  winner.  

EC2, 2011 38

1

1

2

1

2

3

1

2

3

4

Page 39: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Delaunay  Triangula)on/Refinement  

•  Incremental  algorithm  adds  one  point  at  a  )me,  but  points  can  be  added  in  parallel  if  they  don’t  interact.  

•  The  problem  is  that  the  output  will  depend  on  the  order  they  are  added.  

EC2, 2011 39

Page 40: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Delaunay  Triangula)on/Refinement  

•  Adding  points  determinis)cally  

EC2, 2011 40

Page 41: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Delaunay  Triangula)on/Refinement  

•  Adding  points  determinis)cally  

EC2, 2011 41

Page 42: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Delaunay  Triangula)on/Refinement  

•  Adding  points  determinis)cally  

EC2, 2011 42

Page 43: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Delaunay  Triangula)on/Refinement  

•  Adding  points  determinis)cally  

EC2, 2011 43

15

16 16

16

16

16

16

15

15

Page 44: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Conclusions  

It  seems  possible  to  achieve  efficient  determinis)c  parallelism  for  a  variety  of  algorithms.  

Verifying  single  write  on  condi)on  variables  and  that  opera)ons  comute  

EC2, 2011 44

Page 45: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Parallelism  Example:  Convex  Hull  function hsplit(points,p1,p2) = let d = {distance(p,(p1,p2)): p in points}; p’ = {p in points; d | plusp(d)}; in if (#p’ < 2) then [p1] ++ p’ else let pm = points[max_index(d)]; in flatten({hsplit(p’, p1, p2): p1 in [p1, pm]; p2 in [pm,p2]})

function convex_hull(points) = let x = {x : (x,y) in points}; minx = points[min_index(x)]; maxx = points[max_index(x)]; in hsplit(points,minx,maxx) ++ hsplit(points,maxx,minx);

EC2, 2011 45

Page 46: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Commu)ng  Opera)ons  (  

More  formally  (Guy  Steele,  1990):  if  

and  

E1  and  E2  commute  with  respect  to  M  iff:  

   Ma’’  =  Mb

’’    ,        V1a  =  V1b        and,      V2a  =  V2b  

where  the  =>  V  are  the  values  returned  by  the  opera)on.  

EC2, 2011 46

E1 M( ) →Ma' ⇒V1a

E2 Ma'( ) →Ma

'' ⇒V2a

E1 Mb'( ) →Mb

'' ⇒V1b

E2 M( ) →Mb' ⇒V2b

Page 47: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Nested  Parallelism:  parallel  loops  cilk_for (i=0; i < n; i++) ! B[i] = A[i]+1;!

Parallel.ForEach(A, x => x+1);!

B = {x + 1 : x in A}!

#pragma omp for !for (i=0; i < n; i++) ! B[i] = A[i]+1;!

Page47

Cilk  

Microsor  TPL  (C#,F#)  

Nesl,  Parallel  Haskell  

OpenMP  

EC2, 2011

Page 48: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Nested  Parallelism:  fork-­‐join  cobegin { ! S1;! S2;}!

coinvoke(f1,f2)!Parallel.invoke(f1,f2)!

spawn S1;!S2;!sync;!

(exp1 || exp2)!

Page48

Dates  back  to  the  70s  or  possibly  60s.    Used  in  dialects  of  Pascal  

Java  fork-­‐join  framework  

Microsor  TPL  (C#,F#)  

Cilk+  

Various  func)onal  languages  

EC2, 2011

Page 49: Determinis)c+Parallel+Algorithms+ and+Programming+guyb/papers/ec2-11.pdf · parallel Deterministic parallelism General parallelism EC2, 2011 2 ! Parallelism: using multiple processors/cores

Serial  Parallel  DAGs  Dependence  graphs  of  nested  parallel  computa)ons  are  series  parallel  

Page49 EC2, 2011