cilk plus parallel reduction

20
Cilk Plus Reducers Albert DeFusco April 17, 2015 https://github.com/AlbertDeFusco/goldRush

Upload: albert-defusco

Post on 07-Aug-2015

66 views

Category:

Software


10 download

TRANSCRIPT

Cilk Plus Reducers

Albert DeFusco

April 17, 2015

https://github.com/AlbertDeFusco/goldRush

Parallel Gold Sifting

I Each pan can sift a constant amount of dirt/dayI More pans means more dirt sifted

I Each pan sifts independentlyI Each pan has a defined amount of dirt to sift

I Sifting is parallelizable

https://github.com/AlbertDeFusco/goldRush

Parallel Gold Sifting

I Each pan can sift a constant amount of dirt/dayI More pans means more dirt sifted

I Each pan sifts independentlyI Each pan has a defined amount of dirt to sift

I Sifting is parallelizable

https://github.com/AlbertDeFusco/goldRush

Parallel Gold Sifting

I Each pan can sift a constant amount of dirt/dayI More pans means more dirt sifted

I Each pan sifts independentlyI Each pan has a defined amount of dirt to sift

I Sifting is parallelizable

https://github.com/AlbertDeFusco/goldRush

Serial Gold Sifting1 #include < l i s t >2 class pan3 {

4 public :5 pan ( ) ; / / c reate ar ray o f random in tege rs between 1 and 10 ,0006 i n t s i f t ( ) ; / / r e tu rns frequency o f the number 79 ( atomic number o f gold )7 bool hasGold ( ) ; / / c a l l s s i f t ; r e tu rns t rue i f s i f t ( ) > 08 }

910 i n t main ( )11 {

12 std : : l i s t <int> withGold ;13 Pan∗ myPans = new Pan [ nPans ] ;1415 for ( i n t i =0 ; i <nPans ; ++ i )16 {

17 bool gold = myPans [ i ] . hasGold ( ) ;18 i f ( gold ) {19 withGold . push back ( i ) ;20 }

21 }

22 std : : l i s t <int > : : c o n s t i t e r a t o r i t e r a t o r ;23 for ( i t e r a t o r = withGold . begin ( ) ; i t e r a t o r != withGold . end ( ) ; ++ i t e r a t o r )24 std : : cout << ∗ i t e r a t o r << " " ;25 s td : : cout << endl ;2627 return 0 ;28 }

https://github.com/AlbertDeFusco/goldRush

Gold Rush

Gold Rush!

10000 total chunks of dirt1000 pans

Found gold in 15 pansPan IDs: 94 142 265 268 289 440 442 443 569 600 721 781 783 806 818

serial execution took 5.60495 seconds

https://github.com/AlbertDeFusco/goldRush

Parallel Gold Sifting1 #include < l i s t >2 #include < c i l k / c i l k . h>3 class pan4 {

5 public :6 pan ( ) ; / / c reate ar ray o f random in tege rs between 1 and 10 ,0007 i n t s i f t ( ) ; / / r e tu rns frequency o f the number 79 ( atomic number o f gold )8 bool hasGold ( ) ; / / c a l l s s i f t ; r e tu rns t rue i f s i f t ( ) > 09 }

1011 i n t main ( )12 {

13 std : : l i s t <int> withGold ;14 Pan∗ myPans = new Pan [ nPans ] ;1516 c i l k f o r ( i n t i =0 ; i <nPans ; ++ i )17 {

18 bool gold = myPans [ i ] . hasGold ( ) ;19 i f ( gold ) {20 withGold . push back ( i ) ;21 }

22 }

23 std : : l i s t <int > : : c o n s t i t e r a t o r i t e r a t o r ;24 for ( i t e r a t o r = withGold . begin ( ) ; i t e r a t o r != withGold . end ( ) ; ++ i t e r a t o r )25 std : : cout << ∗ i t e r a t o r << " " ;26 s td : : cout << endl ;2728 return 0 ;29 }

https://github.com/AlbertDeFusco/goldRush

Parallel Gold Sifting

I There is a problem to be solvedI How do we keep track of which pans have gold?

I Where does the result get stored?I Who can access the result?

https://github.com/AlbertDeFusco/goldRush

Parallel Gold Sifting1 #include < l i s t >2 #include < c i l k / c i l k . h>3 class pan4 {

5 public :6 pan ( ) ; / / c reate ar ray o f random in tege rs between 1 and 10 ,0007 i n t s i f t ( ) ; / / r e tu rns frequency o f the number 79 ( atomic number o f gold )8 bool hasGold ( ) ; / / c a l l s s i f t ; r e tu rns t rue i f s i f t ( ) > 09 }

1011 i n t main ( )12 {

13 std : : l i s t <int> withGold ;14 Pan∗ myPans = new Pan [ nPans ] ;1516 c i l k f o r ( i n t i =0 ; i <nPans ; ++ i )17 {

18 bool gold = myPans [ i ] . hasGold ( ) ;19 i f ( gold ) {20 withGold . push back ( i ) ;21 }

22 }

23 std : : l i s t <int > : : c o n s t i t e r a t o r i t e r a t o r ;24 for ( i t e r a t o r = withGold . begin ( ) ; i t e r a t o r != withGold . end ( ) ; ++ i t e r a t o r )25 std : : cout << ∗ i t e r a t o r << " " ;26 s td : : cout << endl ;2728 return 0 ;29 }

may not be thread-safe

https://github.com/AlbertDeFusco/goldRush

Thread Safety

I Unsafe operationsI Multiple threads accessing the same address

I Basic types are not thread safeI STL containers may be thread safe for some operations

I Threads read and write memory at undetermined times

I Leads to a race condition

https://github.com/AlbertDeFusco/goldRush

Thread Safety

I Unsafe operationsI Multiple threads accessing the same address

I Basic types are not thread safeI STL containers may be thread safe for some operations

I Threads read and write memory at undetermined times

I Leads to a race condition

https://github.com/AlbertDeFusco/goldRush

Thread Safety

I Unsafe operationsI Multiple threads accessing the same address

I Basic types are not thread safeI STL containers may be thread safe for some operations

I Threads read and write memory at undetermined times

I Leads to a race condition

https://github.com/AlbertDeFusco/goldRush

Race Condition

int total=0;cilk_for(int i=0;i<4;++i) ++total;

Write

Read

0

0

1

1

12

01

total=1

https://github.com/AlbertDeFusco/goldRush

Inefficient solutions

I LockingI Requires careful programmingI Non deterministic

I cannot use cilk sync in the loopI will only sync child threads, not all threads

I Break the loopI Requires more storage and management

1 #include < c i l k / c i l k . h>23 double ∗sum = new double [N ] ;4 / / p a r a l l e l5 c i l k f o r ( i n t i =0 ; i <N;++ i )6 sum[N] = f (N ) ;78 / / s e r i a l9 double t o t a l =0 . 0 ;

10 for ( i n t i =0 ; i <N;++ i )11 t o t a l +=sum[N ] ;

https://github.com/AlbertDeFusco/goldRush

Cilk Reducers

I Provide thread safe access to a “smart pointer”

I Any associative operation is a valid reducer

(x OP y) OP (a OP b) = x OP y OP a OP b

I Small performance overhead for usage

I Very extensible in C++

I Operations are guaranteed to execute in the same order as inserial

https://github.com/AlbertDeFusco/goldRush

Cilk Reducers: views

1 #include < c i l k / c i l k . h>2 #include < c i l k / reducer opadd . h>3 i n t t o t a l =0 ;4 c i l k : : reducer< c i l k : : < op add<int>> r e d u c e r t o t a l ( 0 ) ;5 c i l k f o r ( i n t i =0 ; i <4;++ i )6 ++∗ r e d u c e r t o t a l ;7 t o t a l = r e d u c e r t o t a l . ge t va lue ( ) ;

I At spawn each strand gets a private view of the reducer

I Strands must dereference the pointer to operate on its value

I When strands mergeI views are combined by OPI The combined view is given to the exit strand

https://github.com/AlbertDeFusco/goldRush

Cilk Plus Reducers: views#include <cilk/reducer_opadd.h>int total=0;cilk::reducer<cilk::<op_add<int>> reducer_total (0);for(int i=0;i<4;++i) ++*reducer_total;total = reducer_total.get_value();

Merge update

Private View

total=40

0

0

0

0

0

1

1

1

12

2

4 4

2

2

https://github.com/AlbertDeFusco/goldRush

Cilk Plus Reducers: types

usage: cilk::reducer<cilk::REDUCER_TYPE<<my_type>> my_reducer;

Reducer Type Description Headerop add ++, --, +=, -=, +, - #include <cilk/reducer_add.h>op vector Provides push_back() #include <cilk/reducer_vector.h>op list append Provides push_back() #include <cilk/reducer_list.h>op list prepend Provides push_front() #include <cilk/reducer_list.h>op max Returns maximum #include <cilk/reducer_max.h>op min Returns minimum #include <cilk/reducer_min.h>op ostream Provides << #include <cilk/reducer_string.h>

Table: https://software.intel.com/en-us/node/522606

https://github.com/AlbertDeFusco/goldRush

Parallel gold sifting

1 #include < c i l k / c i l k . h>2 #include < c i l k / r e d u c e r l i s t . h>34 std : : l i s t <int> withGold ;5 c i l k : : reducer< c i l k : : op l i s t append <int> > reducer wi thGold ;6 Pan∗ myPans = new Pan [ nPans ] ;78 c i l k f o r ( i n t i =0 ; i <nPans ; ++ i )9 {

10 bool gold = myPans [ i ] . hasGold ( ) ;11 i f ( gold ) {12 reducer wi thGold−>push back ( i ) ;13 }

14 }

15 withGold = reducer wi thGold . ge t va lue ( ) ;1617 std : : l i s t <int > : : c o n s t i t e r a t o r i t e r a t o r ;18 for ( i t e r a t o r = withGold . begin ( ) ; i t e r a t o r != withGold . end ( ) ; ++ i t e r a t o r )19 std : : cout << ∗ i t e r a t o r << " " ;20 s td : : cout << endl ;

https://github.com/AlbertDeFusco/goldRush

Gold Rush

$>cat /proc/cpuinfo | grep Xeon | uniq -c16 model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz

$>CILK_NWORKERS=16 ./goldRushGold Rush!

100000 total chunks of dirt1000 pans

Found gold in 15 pansPan IDs: 94 142 265 268 289 440 442 443 569 600 721 781 783 806 818

Cilk identified the correct pans

serial execution took 5.60154 seconds

parallel execution took 0.39801 seconds with 16 workersparallel speedup 14.0739paralell efficiency 0.879616

https://github.com/AlbertDeFusco/goldRush