01 l 6 multicore programming · multicore programming julian shun 1. multicore processors 2 q why...

75
!"##$ %&'&( ! "#) +)$#) +, -./01 6.172 Performance Engineering of Software Systems © 2008–2018 by the MIT 6.172 Lecturers LECTURE 6 Multicore Programming Julian Shun 1

Upload: others

Post on 15-Oct-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

!"##$ %&'&(!"#)*+)$#)*+,*-./01

6.172 Performance Engineering of Software Systems

© 2008–2018 by the MIT 6.172 Lecturers

LECTURE 6 Multicore Programming Julian Shun

1

Page 2: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Multicore Processors

2

Q Why do semicon-ductor vendors provide chips with multiple processor cores?

A Because of Moore’s Law and the end of the scaling of clock frequency.

Intel Haswell-E

© Intel. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/

© 2008–2018 by the MIT 6.172 Lecturers

Page 3: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Technology Scaling Transistor

count is still 10,000,000

1,000,000

100,000

Transistors x 1000! Clock frequency (MHz)

rising, …

but clock speedis bounded at

~4GHz.

10,000

1,000

100

10

1

1970 1980 1990 2000 2010

© 2008–2018 by the MIT 6.172 Lecturers 3

0

Page 4: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Power Density

Source: Patrick Gelsinger, Intel Developer’s Forum, Intel Corporation, 2004.

Projected power density, if clock frequency had continued its trend of scaling 25%-30% per year.

© Paul Gelsinger at Intel Corporation. All rights reserved. This content is excluded from our Creative Commons license. For more information, see

© 2008–2018 by the MIT 6.172 Lecturers 4 https://ocw.mit.edu/help/faq-fair-use/

Page 5: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Technology Scaling

10,000,000

© 2008–2018 by the MIT 6.172 Lecturers

0

1

10

100

1,000

10,000

100,000

1,000,000

1970 1980 1990 2000 2010

Transistors x 1000 ! Clock frequency (MHz) ! Power (W) " Cores

2010Each generation of

Moore’s Law potentially doubles

the number of cores. 5

Page 6: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Abstract Multicore Architecture

Memory I/O

$

P

$

P

$

P

Network

Chip Multiprocessor (CMP) © 2008–2018 by the MIT 6.172 Lecturers 6

Page 7: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

OUTLINE

• Shared-Memory Hardware • Concurrency Platforms Pthreads (and WinAPI Threads) Threading Building Blocks OpenMP Cilk Plus

© 2008–2018 by the MIT 6.172 Lecturers 7

Page 8: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Cache Coherence

… Load !

P P P

!"#

© 2008–2018 by the MIT 6.172 Lecturers 8

!"#!"#

Page 9: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Cache Coherence

© 2008–2018 by the MIT 6.172 Lecturers

… P P P

!"# !"#

Load !

!"#!"#

9

Page 10: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Cache Coherence

!"#

… P P P

!"# !"# Load !

!"# !"#

© 2008–2018 by the MIT 6.172 Lecturers 10

Page 11: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Cache Coherence

© 2008–2018 by the MIT 6.172 Lecturers

!"#

… P P P

!"# !"# !"#

Load !

!"#

11

Page 12: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Cache Coherence

… P P P

!"# !"# !"# Store !

!"$

© 2008–2018 by the MIT 6.172 Lecturers 12

!"#

Page 13: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Cache Coherence

!"#

… P P P

!"# !"# !"$ Load ! !"#

Oops!

© 2008–2018 by the MIT 6.172 Lecturers 13

Page 14: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

MSI Protocol

Each cache line is labeled with a state: • !: cache block has been modified. No other

caches contain this block in ! or ( states. • ": other caches may be sharing this block. • #: cache block is invalid (same as not there).

!"#$%&' ("#)%&* +"#,%-

("#)%&* !"#,%*

+"#$%.

+"#,%'

+"#$%&/ ("#)%&*

Before a cache modifies a location, the hardware first invalidates all other copies.

© 2008–2018 by the MIT 6.172 Lecturers 14

Page 15: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

MSI Protocol

Each cache line is labeled with a state: • !: cache block has been modified. No other

caches contain this block in ! or ( states. • ": other caches may be sharing this block. • #: cache block is invalid (same as not there).

!"#$%&' ("#)%&* +"#,%-

("#)%&* !"#,%*

+"#$%.

+"#,%'

+"#$%&/ ("#)%&*

Store )%0

© 2008–2018 by the MIT 6.172 Lecturers 15

Page 16: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

MSI Protocol

Each cache line is labeled with a state: • !: cache block has been modified. No other

caches contain this block in ! or ( states. • ": other caches may be sharing this block. • #: cache block is invalid (same as not there).

!"#$%&' ("#)%&* +"#,%-

("#)%&* !"#,%*

+"#$%.

+"#,%'

+"#$%&/ ("#)%&*

Store )%0

© 2008–2018 by the MIT 6.172 Lecturers 16

Page 17: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

MSI Protocol

Each cache line is labeled with a state: • !: cache block has been modified. No other

caches contain this block in ! or - states. • ": other caches may be sharing this block. • #: cache block is invalid (same as not there).

!"#$%&' ("#)%&* ("#+%,

-"#)%&* !"#+%*

("#$%.

("#+%'

("#$%&/ ("#)%&*

Store )%0

© 2008–2018 by the MIT 6.172 Lecturers 17

Page 18: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

MSI Protocol

Each cache line is labeled with a state: • !: cache block has been modified. No other

caches contain this block in ! or - states. • ": other caches may be sharing this block. • #: cache block is invalid (same as not there).

!"#$%&' ("#)%&* ("#+%,

!"#)%. !"#+%*

("#$%/

("#+%'

("#$%&0 ("#)%&*

Store )%.

© 2008–2018 by the MIT 6.172 Lecturers 18

Page 19: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

MSI Protocol

Each cache line is labeled with a state: • !: cache block has been modified. No other

caches contain this block in ! or - states. • ": other caches may be sharing this block. • #: cache block is invalid (same as not there).

!"#$%&' ("#)%&* ("#+%,

!"#)%. !"#+%*

("#$%/

("#+%'

("#$%&0 ("#)%&*

Store )%.

© 2008–2018 by the MIT 6.172 Lecturers 19

Page 20: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Outline

• Shared-Memory Hardware • Concurrency Platforms Pthreads (and WinAPI Threads) Threading Building Blocks OpenMP Cilk

© 2008–2018 by the MIT 6.172 Lecturers 20

Page 21: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Concurrency Platforms

• Programming directly on processor cores is painful and error-prone.

• A concurrency platform abstracts processor cores, handles synchronization and communication protocols, and performs load balancing.

• Examples Pthreads and WinAPI threads Threading Building Blocks (TBB) OpenMP Cilk

© 2008–2018 by the MIT 6.172 Lecturers 21

Page 22: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci Numbers The Fibonacci numbers are the sequence !!, ", ", #, $, %, &, "$, #", $', …", where each number is the sum of the previous two.

Recurrence: (! = !, (" = ", () = ()*" + ()*# for ) > ".

The sequence is named after Leonardo di Pisa (1170– 1250 A.D.), also known as Fibonacci, a contraction of filius Bonaccii —“son of Bonaccio.” Fibonacci’s 1202 book Liber Abaci introduced the sequence to Western mathematics, although it had previously been discovered by Indian mathematicians.

© 2008–2018 by the MIT 6.172 Lecturers 22 Image is in the public domain.

Page 23: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci Program !"#$%&'()*"#++,-(./01 !"#$%&'()*.+'"2/01 !"#$%&'()*.+'%"3/01

"#+456+ 7"38"#+456+)#9):) "7 8# *);9):) <(+&<#)#=)

>)(%.( : "#+456+)? @ 7"38#AB9= "#+456+), @)7"38#A;9= <(+&<#)8? C),9=

> >

"#+ DE"#8"#+ E<F$G $0E<)HE<FIJK9): "#+456+)# @)E+2"8E<FIJBK9= "#+456+)<(.&%+)@)7"38#9= -<"#+78LM"32#E$$")27)NL OPQ'45)L)".)NL)OPQ'45)L/R#LG)))

#G)<(.&%+9= <(+&<#)S=

>

Disclaimer to Algorithms Police This recursive program is a poor way to compute the #th Fibonacci number, but it provides for a good didactic example.

23 © 2008–2018 by the MIT 6.172 Lecturers

Page 24: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci Execution

!"#$)& !"#$*&

Key idea for parallelization The calculations of !"#$+,)&and !"#$+,(& can be executed simultaneously without mutual interference.

"+-.%/- !"#$"+-.%/-0+&010 "! $+ 20(&010 34-53+0+60

704894 1 "+-.%/-0: ; !"#$+,)&6 "+-.%/-0< ;0!"#$+,(&6 34-53+0$: =0<&6

7 7

© 2008–2018 by the MIT 6.172 Lecturers 24

!"#$%&

!"#$(&

!"#$)& !"#$*& !"#$)&

!"#$'&

!"#$(&

Page 25: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

OUTLINE

• Shared-Memory Hardware• Concurrency Platforms

• Pthreads (and WinAPI Threads)• Threading Building Blocks• OpenMP• Cilk

© 2008–2018 by the MIT 6.172 Lecturers 25

Page 26: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Pthreads*

∙ Standard API for threading specified byANSI/IEEE POSIX 1003.1-2008.

∙ Do-it-yourself concurrency platform.∙ Built as a library of functions with “special”

non-C semantics.∙ Each thread implements an abstraction of a

processor, which are multiplexed onto machineresources.

∙ Threads communicate though shared memory.∙ Library functions mask the protocols involved

in interthread coordination.

*WinAPI threads provide similar functionality.© 2008–2018 by the MIT 6.172 Lecturers 26

Page 27: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Key Pthread Functions

© 2008–2018 by the MIT 6.172 Lecturers

!"# $#%&'()*+&'(#', $#%&'()*# -#%&'(). !!"#$%"&#'()'#&$)*)#"(*+"($,#(&#-($,"#.'(

+/"0#1$#%&'()*(##&*# -(##&.1 !!+/0#1$($+(2#$($,"#.'(.$$")/%$#2(34566(*+"('#*.%7$8

2/!)1-,-34"+5,2/!) -5.1 !!"+%$)&#(#9#1%$#'(.*$#"(1"#.$)+&(

2/!)1-(&6 !!.(2)&:7#(.":%;#&$(<.22#'($+(*%&1

5 !!"#$%"&2(#""+"(2$.$%2

!"# $#%&'()*7/!", $#%&'()*# #%&'(). !!)'#&$)*)#"(+*($,"#.'($+(-.)$(*+"

2/!)1--0#(#40 !!$#";)&.$)&:($,"#.'=2(2$.$%2(34566($+():&+"#8

51!!"#$%"&2(#""+"(2$.$%2

27

Page 28: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Pthread Implementation

F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>

?

"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>

"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<

2(.&%+)A)9"5:#;> ?)(%.()<

32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)

HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>

? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))

#L)2(.&%+;> 2(+&2#)O>

?

28 © 2008–2018 by the MIT 6.172 Lecturers

!"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01

"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)

2(+&2#)#>) ?)(%.( <

"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>

+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>

?)+02(3'832E.>

? ?

Page 29: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01

"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)

2(+&2#)#>) ?)(%.( <

"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>

? ?

+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>

?)+02(3'832E.>

F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>

?

"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>

"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<

2(.&%+)A)9"5:#;> ?)(%.()<

32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)

HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>

? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))

#L)2(.&%+;> 2(+&2#)O>

?

Original code.

29

© 2008–2018 by the MIT 6.172 Lecturers

Page 30: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01

"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)

2(+&2#)#>) ?)(%.( <

"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>

? ?

+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>

?)+02(3'832E.>

F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>

?

"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>

"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<

2(.&%+)A)9"5:#;> ?)(%.()<

32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)

HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>

? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))

#L)2(.&%+;> 2(+&2#)O>

?

Structure for thread

arguments.

30

© 2008–2018 by the MIT 6.172 Lecturers

Page 31: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Pthread Implementation

© 2008–2018 by the MIT 6.172 Lecturers

!"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01

"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)

2(+&2#)#>) ?)(%.( <

"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>

? ?

+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>

?)+02(3'832E.>

F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>

?

"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>

"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<

2(.&%+)A)9"5:#;> ?)(%.()<

32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)

HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>

? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))

#L)2(.&%+;> 2(+&2#)O>

?

F4"' G-+2;)<

Function called when

thread is created.

31

Page 32: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01

"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)

2(+&2#)#>) ?)(%.( <

"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>

? ?

+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>

?)+02(3'832E.>

F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>

?

"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'>+02(3'832E. 32E.> "#+ .+3+&.> "#+678+)2(.&%+>

"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<

2(.&%+)A)9"5:#;> ?)(%.()<

32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)

HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>

? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))

#L)2(.&%+;> 2(+&2#)O>

?

"#++,-(./0"#++,-(./01-+02(3'/0.+'"4/0.+'%"5/0

"#+678+)

9"5:#9"5:#BC;>9"5:#9"5:#B=;>

"#+-+02(3'8++02(3'832E."#+"#+678+)

"9)"#+678+)"9)"9)

"#+

"#++,-(./0"#++,-(./0

"#+678+)"#+678+)

No point in creating thread if there isn’t

enough to do.

32 © 2008–2018 by the MIT 6.172 Lecturers

Page 33: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Pthread Implementation

F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>

?

"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>

"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<

2(.&%+)A)9"5:#;> ?)(%.()<

32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)

HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>

? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))

#L)2(.&%+;> 2(+&2#)O>

?

2(+&2#)C>)?.+2+4&%:32EFMCNL)HIJJ

Marshal input argument to

thread.

33 © 2008–2018 by the MIT 6.172 Lecturers

!"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01

"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)

2(+&2#)#>) ?)(%.( <

"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>

? ?

+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>

?)+02(3'832E.>

Page 34: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01

"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)

2(+&2#)#>) ?)(%.( <

"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>

? ?

+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>

?)+02(3'832E.>

F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>

?

"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>

"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<

2(.&%+)A)9"5:#;> ?)(%.()<

32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)

HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>

? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))

#L)2(.&%+;> 2(+&2#)O>

?

-+2;)<

.+3+&.)A)

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)2(.&%+)A)G;)-+2; "#-&+>

;)<

Create thread to execute 9"5:#ZC;.

34

© 2008–2018 by the MIT 6.172 Lecturers

Page 35: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01

"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)

2(+&2#)#>) ?)(%.( <

"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>

? ?

+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>

?)+02(3'832E.>

F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>

?

"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>

"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<

2(.&%+)A)9"5:#;> ?)(%.()<

32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)

HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>

? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))

#L)2(.&%+;> 2(+&2#)O>

?

F4"'+02(3'832E.;)-+2

32E./"#-&+.+3+&.)

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9)2(.&%+)!!".$%)"/(0")1+")10+$2")(")+0#%&$)++02(3'832E. ;)-+2;B1B1"#-&+>

;)-+2; 4&+-&+) 9"5:9"5:";>

Main program executes

9"5:#Z=; "#) -323%%(%.

35

© 2008–2018 by the MIT 6.172 Lecturers

Page 36: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01

"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)

2(+&2#)#>) ?)(%.( <

"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>

? ?

+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>

?)+02(3'832E.>

F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>

?

"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>

"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<

2(.&%+)A)9"5:#;> ?)(%.()<

32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)

HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!".$%)"/(0")1+")10+$2")(")+0#%&$)+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>

? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))

#L)2(.&%+;> 2(+&2#)O>

?

F4"' -+2;)<+02(3'832E. G;)-+2;B1B1"#-&+>;)-+2;B14&+-&+)A)9"5:9"5:";>

32E./"#-&+.+3+&.)

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9)2(.&%+)!!".$%)"/(0")1+")10+$2")(")+0#%&$)+.+3+&.)

F4"' G-+2;)<

Block until the auxiliary thread

finishes.

36 © 2008–2018 by the MIT 6.172 Lecturers

Page 37: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Pthread Implementation !"#$%&'()*"#++,-(./01 !"#$%&'()*-+02(3'/01 !"#$%&'()*.+'"4/01 !"#$%&'()*.+'%"5/01

"#+678+ 9"5:"#+678+)#;)<) "9 :# *)=;)<)

2(+&2#)#>) ?)(%.( <

"#+678+)@ A 9"5:#BC;> "#+678+), A)9"5:#B=;> 2(+&2#):@ D),;>

? ?

+,-('(9 .+2&$+ < "#+678+)"#-&+> "#+678+)4&+-&+>

?)+02(3'832E.>

F4"')G+02(3'89&#$:F4"' G-+2;)< "#+678+)" A)::+02(3'832E. G;)-+2;B1"#-&+> ::+02(3'832E. G;)-+2;B14&+-&+)A)9"5:";> 2(+&2#)HIJJ>

?

"#+ K3"#:"#+ 32E$L $032)G32EFMN;)< -+02(3'8+ +02(3'> +02(3'832E. 32E.>"#+ .+3+&.>"#+678+)2(.&%+>

"9):32E$ * =;)<)2(+&2#)C>)? "#+678+)# A .+2+4&%:32EFMCNL)HIJJL)O;> "9):#)* PO;)<

2(.&%+)A)9"5:#;> ?)(%.()<

32E./"#-&+ A)#BC> .+3+&.)A)!"#$%&'()$%&"%:Q+02(3'L)

HIJJL) +02(3'89&#$L):F4"'G;)Q32E.;>

!!"#$%&"'$&"'(&)%&*+"+,+'*)%&-"9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)A)9"5:#B=;> !!"#$%&"'()"&*+"&*)+$,"&("&+)-%.$&+ .+3+&.)A)!"#$%&'(*+,-:+02(3'L)HIJJ;> "9):.+3+&.)RA HIJJ;)<)2(+&2#)C>)? 2(.&%+)DA)32E./4&+-&+>

? -2"#+9:ST"54#3$$")49)US VWX'67)S)".)US)VWX'67)S/Y#SL)))

#L)2(.&%+;> 2(+&2#)O>

?

2(.&%+)?-2"#+9

2(+&2#)?

Add the results together to produce the final output.

37 © 2008–2018 by the MIT 6.172 Lecturers

Page 38: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

The cost of creating a thread >104

cycles ! coarse-grained concurrency. (Thread pools can help.) Fibonacci code gets at most about 1.5 speedup for 2 cores. Need a rewrite for more cores. The Fibonacci logic is no longer neatly encapsulated in the !"#$% function. Programmers must marshal arguments (shades of 1958! ) and engage in error-prone protocols in order to load-balance.

38

Issues with Pthreads

© 2008–2018 by the MIT 6.172 Lecturers

Overhead

Scalability

Modularity

Code Simplicity

Page 39: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Outline

• Shared-Memory Hardware• Concurrency Platforms

• Pthreads (and WinAPI Threads)• Threading Building Blocks• OpenMP• Cilk

© 2008–2018 by the MIT 6.172 Lecturers 39

Page 40: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Threading Building Blocks

! Developed by Intel.! Implemented as a C!! library

that runs on top of nativethreads.

! Programmer specifies tasksrather than threads.

! Tasks are automatically loadbalanced across the threadsusing a work-stealingalgorithm inspired by researchat MIT.

! Focus on performance.

© 2008–2018 by the MIT 6.172 Lecturers 40

Page 41: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in TBB

!"#$%&$'()"*'+)&,--. +/'""&0#-1'"23&*!-/#+&,'"2&4 *!-/#+3

+5$", #$,678,&$.&#$,678,9&+5$",&"!(.&

0#-1'"2:#$,678,&$8;&#$,678,9&"!(8<&3&

$:$8<;&"!(:"!(8<&4=&

,'"29&)>)+!,):<&4& #?:&$&@ A <&4

9"!(&B $. =&)/")&4

#$,678,&>;&C.& 0#-1'"2D ' B 9$)E:&'//5+',)8+F#/G:<&<

0#-1'"2:$HI;&D><.& 0#-1'"2D&- B 9$)E:&'//5+',)8+F#/G:<&<&

0#-1'"2:$HA;&DC<.& !"#$%"&$'()*#:J<.& !+,-*:-<.& !+,-*$,*.$-,/#$&(%$,00:'<.& 9"!(&B >&K&C.

=& L),!L$&MNOO.&

= =.

P#$+/!G)&@+",G#$,Q P#$+/!G)&@#5",L)'(Q P#$+/!G)&R,--S,'"2TFR

#$, ('#$:#$, 'L%+; +F'L&9'L%UVW<&4 #$,678,&L)". #?&:'L%+ @ A<&4&L),!L$&I.&= #$,678,&$ B

",L,5!/:'L%UVIW;&MNOO;&X<. 0#-1'"2D ' B 9$)E:,'"233'//5+',)8L55,:<<

0#-1'"2:$;&DL)"<.& ,'"233!+,-*$%((#$,*.$-,/#:'<.

",G33+5!, @@&R0#-5$'++#&5?&R&@@&$& @@ R&#"&R&@@&L)"&@@&",G33)$G/.

L),!L$&X. =

41 © 2008–2018 by the MIT 6.172 Lecturers

Page 42: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in TBB

!"#$%&$'()"*'+)&,--. +/'""&0#-1'"23&*!-/#+&,'"2&4 *!-/#+3

+5$", #$,678,&$.&#$,678,9&+5$",&"!(.&

0#-1'"2:#$,678,&$8;&#$,678,9&"!(8<&3&

$:$8<;&"!(:"!(8<&4=&

,'"29&)>)+!,):<&4& #?:&$&@ A <&4

9"!(&B $. =&)/")&4

#$,678,&>;&C.& 0#-1'"2D ' B 9$)E:&'//5+',)8+F#/G:<&<

0#-1'"2:$HI;&D><.& 0#-1'"2D&- B 9$)E:&'//5+',)8+F#/G:<&<&

0#-1'"2:$HA;&DC<.& !"#$%"&$'()*#:J<.& !+,-*:-<.& !+,-*$,*.$-,/#$&(%$,00:'<.& 9"!(&B >&K&C.

=& L),!L$&MNOO.&

= =.

A computation organized as explicit tasks.

P#$+/!G)&@+",G#$,Q P#$+/!G)&@#5",L)'(Q P#$+/!G)&R,--S,'"2TFR

#$, ('#$:#$, 'L%+; +F'L&9'L%UVW<&4 #$,678,&L)". #?&:'L%+ @ A<&4&L),!L$&I.&= #$,678,&$ B

",L,5!/:'L%UVIW;&MNOO;&X<. 0#-1'"2D ' B 9$)E:,'"233'//5+',)8L55,:<<

0#-1'"2:$;&DL)"<.& ,'"233!+,-*$%((#$,*.$-,/#:'<.

",G33+5!, @@&R0#-5$'++#&5?&R&@@&$& @@ R&#"&R&@@&L)"&@@&",G33)$G/.

L),!L$&X. =

42

© 2008–2018 by the MIT 6.172 Lecturers

Page 43: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in TBB

!"#$%&$'()"*'+)&,--. +/'""&0#-1'"23&*!-/#+&,'"2&4 *!-/#+3

+5$", #$,678,&$.&#$,678,9&+5$",&"!(.&

0#-1'"2:#$,678,&$8;&#$,678,9&"!(8<&3&

$:$8<;&"!(:"!(8<&4=&

,'"29&)>)+!,):<&4& #?:&$&@ A <&4

9"!(&B $. =&)/")&4

#$,678,&>;&C.& 0#-1'"2D ' B 9$)E:&'//5+',)8+F#/G:<&<

0#-1'"2:$HI;&D><.& 0#-1'"2D&- B 9$)E:&'//5+',)8+F#/G:<&<&

0#-1'"2:$HA;&DC<.& !"#$%"&$'()*#:J<.& !+,-*:-<.& !+,-*$,*.$-,/#$&(%$,00:'<.& 9"!(&B >&K&C.

=& L),!L$&MNOO.&

= =.

0#-1'"2 has an input parameter $ and an output parameter "!(.

P#$+/!G)&@+",G#$,Q P#$+/!G)&@#5",L)'(Q P#$+/!G)&R,--S,'"2TFR

#$, ('#$:#$, 'L%+; +F'L&9'L%UVW<&4 #$,678,&L)". #?&:'L%+ @ A<&4&L),!L$&I.&= #$,678,&$ B

",L,5!/:'L%UVIW;&MNOO;&X<. 0#-1'"2D ' B 9$)E:,'"233'//5+',)8L55,:<<

0#-1'"2:$;&DL)"<.& ,'"233!+,-*$%((#$,*.$-,/#:'<.

",G33+5!, @@&R0#-5$'++#&5?&R&@@&$& @@ R&#"&R&@@&L)"&@@&",G33)$G/.

L),!L$&X. =

43

© 2008–2018 by the MIT 6.172 Lecturers

Page 44: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in TBB

!"#$%&$'()"*'+)&,--. +/'""&0#-1'"23&*!-/#+&,'"2&4 *!-/#+3

+5$", #$,678,&$.&#$,678,9&+5$",&"!(.& 0#-1'"2:#$,678,&$8;&#$,678,9&"!(8<&3&

$:$8<;&"!(:"!(8<&4=&

,'"29&)>)+!,):<&4& #?:&$&@ A <&4

9"!(&B $. =&)/")&4

#$,678,&>;&C.& 0#-1'"2D ' B 9$)E:&'//5+',)8+F#/G:<&<

0#-1'"2:$HI;&D><.& 0#-1'"2D&- B 9$)E:&'//5+',)8+F#/G:<&<&

0#-1'"2:$HA;&DC<.& !"#$%"&$'()*#:J<.& !+,-*:-<.& !+,-*$,*.$-,/#$&(%$,00:'<.& 9"!(&B >&K&C.

=& L),!L$&MNOO.&

= =.

*!-/#+&,'"2&4

.&

;&#$,678,9&"!(8<&

$:$8<;&"!(:"!(8<&4=&

The )>)+!,):<&function performs the

computation when thetask is started.

P#$+/!G)&@+",G#$,Q P#$+/!G)&@#5",L)'(Q P#$+/!G)&R,--S,'"2TFR

#$, ('#$:#$, 'L%+; +F'L&9'L%UVW<&4 #$,678,&L)". #?&:'L%+ @ A<&4&L),!L$&I.&= #$,678,&$ B

",L,5!/:'L%UVIW;&MNOO;&X<. 0#-1'"2D ' B 9$)E:,'"233'//5+',)8L55,:<<

0#-1'"2:$;&DL)"<.& ,'"233!+,-*$%((#$,*.$-,/#:'<.

",G33+5!, @@&R0#-5$'++#&5?&R&@@&$& @@ R&#"&R&@@&L)"&@@&",G33)$G/.

L),!L$&X. =

44

© 2008–2018 by the MIT 6.172 Lecturers

Page 45: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in TBB

!"#$%&$'()"*'+)&,--. +/'""&0#-1'"23&*!-/#+&,'"2&4 *!-/#+3

+5$", #$,678,&$.&#$,678,9&+5$",&"!(.&

0#-1'"2:#$,678,&$8;&#$,678,9&"!(8<&3& $:$8<;&"!(:"!(8<&4=&

,'"29&)>)+!,):<&4& #?:&$&@ A <&4

9"!(&B $. =&)/")&4

#$,678,&>;&C.& 0#-1'"2D ' B 9$)E:&'//5+',)8+F#/G:<&<

0#-1'"2:$HI;&D><.& 0#-1'"2D&- B 9$)E:&'//5+',)8+F#/G:<&<&

0#-1'"2:$HA;&DC<.& !"#$%"&$'()*#:J<.& !+,-*:-<.& !+,-*$,*.$-,/#$&(%$,00:'<.& 9"!(&B >&K&C.

=& L),!L$&MNOO.&

= =.

P#$+/!G)&@+",G#$,Q P#$+/!G)&@#5",L)'(Q P#$+/!G)&R,--S,'"2TFR

#$, ('#$:#$, 'L%+; +F'L&9'L%UVW<&4 #$,678,&L)". #?&:'L%+ @ A<&4&L),!L$&I.&= #$,678,&$ B

",L,5!/:'L%UVIW;&MNOO;&X<. 0#-1'"2D ' B 9$)E:,'"233'//5+',)8L55,:<<

0#-1'"2:$;&DL)"<.& ,'"233!+,-*$%((#$,*.$-,/#:'<.

",G33+5!, @@&R0#-5$'++#&5?&R&@@&$& @@ R&#"&R&@@&L)"&@@&",G33)$G/.

L),!L$&X. =

#$,678,9&"!(8<&3&

$:$8<;&"!(:"!(8<&4=&

P#$+/!G)&@+",G#$,+",G#$,Q

P#$+/!G)&@#5",L)'(#5",L)'(Q

P#$+/!G)&R,--S,'"2TF,--S,'"2TFR

#$, ('#$:#$, 'L%+'L%+; +F'L&9'L%U

#$,678,&L)".

#?&: <&4&<&4& .&=

Recursively create two child tasks, ' and -.

45

© 2008–2018 by the MIT 6.172 Lecturers

Page 46: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in TBB

!"#$%&'()*$+,'"#,-

!"#$%&'()*".+,/(01-

!"#$%&'()2,334,0+5672

"#, 10"#8"#, 0/9$: $70/);0/9<=>?)@ "#,ABC,)/(+D "E)80/9$ * F?)@)/(,&/#)GD)H "#,ABC,)# I

+,/,.&%80/9<=G>:)JKLL:)M?D N"3O0+5P 0 I ;#(Q8,0+5RR0%%.$0,(C/..,8??

N"3O0+58#:)P/(+?D) ,0+5RR!"#$%&'(()&#%*&$#+)80?D

+,'RR$.&, **)2N"3.#0$$").E)2)**)#) ** 2)"+)2)**)/(+)**)+,'RR(#'%D

/(,&/#)MD H

&+"#9)#01(+S0$(),33D $%0++)N"3O0+5R)S&3%"$),0+5)@ S&3%"$R

$.#+, "#,ABC,)#D) "#,ABC,;)$.#+,)+&1D) N"3O0+58"#,ABC,)#C:)"#,ABC,;)+&1C?)R)

#8#C?:)+&18+&1C?)@H)

,0+5;)(T($&,(8?)@) "E8)#)* F ?)@

;+&1)I #D H)(%+()@

"#,ABC,)T:)UD) N"3O0+5P 0 I ;#(Q8)0%%.$0,(C$7"%'8?)?

N"3O0+58#VG:)PT?D) N"3O0+5P)3 I ;#(Q8)0%%.$0,(C$7"%'8?)?)

N"3O0+58#VF:)PU?D) !,)&',-&.(/%)8W?D) !"#$%83?D) !"#$%&#%*&$#+)&-('&#0080?D) ;+&1)I T)X)UD

H) /(,&/#)JKLLD)

H HD

!"#$%&'() $+,'"#,$+,'"#,-

!"#$%&'()*".+,/(01".+,/(01-

!"#$%&'()2,334,0+567,334,0+5672

"#, 10"#8"#, 0/9$0/9$: $70/);0/9<=>?)@=>?)@

"#,ABC,)/(+D

"E)80/9$ * F?)@)?)@)/(,&/#)GD)H

"#,ABC,)# I

+,/,.&%80/9<=+,/,.&%80/9<=G>:)JKLL:)M?D

N"3O0+5P 0 I ;I ;#(Q8,0+5RR0%%.$0,(C/..,8??RR0%%.$0,(C/..,8??

N"3O0+58#:)PP/(+?D)

,0+5)@

"#,ABC,;)+&1C?)R)

#8#C?:)+&18+&1C?)@H)

8)0%%.$0,(C$7"%'8?)?8)0%%.$0,(C$7"%'8?)?

N"3O0+58#N"3O0+58#N"3O0+58#N"3O0+58#VGG:)PPT?D)T?D)

8)0%%.$0,(C$7"%'8?)?)8)0%%.$0,(C$7"%'8?)?)

N"3O0+5N"3O0+58#VF:)PU?D)

Set the number of tasks to wait for (2 children + 1

implicit for bookkeeping).

46 © 2008–2018 by the MIT 6.172 Lecturers

Page 47: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in TBB

!"#$%&'()*$+,'"#,-

!"#$%&'()*".+,/(01-

!"#$%&'()2,334,0+5672

"#, 10"#8"#, 0/9$: $70/);0/9<=>?)@ "#,ABC,)/(+D "E)80/9$ * F?)@)/(,&/#)GD)H "#,ABC,)# I

+,/,.&%80/9<=G>:)JKLL:)M?D N"3O0+5P 0 I ;#(Q8,0+5RR0%%.$0,(C/..,8??

N"3O0+58#:)P/(+?D) ,0+5RR!"#$%&'(()&#%*&$#+)80?D

+,'RR$.&, **)2N"3.#0$$").E)2)**)#) ** 2)"+)2)**)/(+)**)+,'RR(#'%D

/(,&/#)MD H

&+"#9)#01(+S0$(),33D $%0++)N"3O0+5R)S&3%"$),0+5)@ S&3%"$R

$.#+, "#,ABC,)#D) "#,ABC,;)$.#+,)+&1D) N"3O0+58"#,ABC,)#C:)"#,ABC,;)+&1C?)R)

#8#C?:)+&18+&1C?)@H)

,0+5;)(T($&,(8?)@) "E8)#)* F ?)@

;+&1)I #D H)(%+()@

"#,ABC,)T:)UD) N"3O0+5P 0 I ;#(Q8)0%%.$0,(C$7"%'8?)?

N"3O0+58#VG:)PT?D) N"3O0+5P)3 I ;#(Q8)0%%.$0,(C$7"%'8?)?)

N"3O0+58#VF:)PU?D) !,)&',-&.(/%)8W?D) !"#$%83?D) !"#$%&#%*&$#+)&-('&#0080?D) ;+&1)I T)X)UD

H) /(,&/#)JKLLD)

H HD

!"#$%&'()*$+,'"#,$+,'"#,-

!"#$%&'()*".+,/(01".+,/(01-

!"#$%&'()2,334,0+567,334,0+5672

"#, 10"#8"#, 0/9$0/9$: $70/);0/9<=>?)@

"#,ABC,)/(+D

"E)80/9$ * F?)@)?)@)/(,&/#)GD)H

"#,ABC,)# I

+,/,.&%80/9<=+,/,.&%80/9<=G>:)JKLL:)M?D

N"3O0+5P 0 I ;I ;#(Q8,0+5RR0%%.$0,(C/..,8??

N"3O0+58#:)

,0+5RR!"#$%&'(()&#%*&$#+)!"#$%&'(()&#%*&$#+)80?D

S&3%"$),0+5)@

#D)

$.#+,)+&1D)

"#,ABC,)#C:)"#,ABC,;)+&1C?)R)

#8#C?:)+&18+&1C?)@H)

8?)@)

?)@

:)UD)

0 I ;#(Q8)0%%.$0,(C$7"%'8?)?8)0%%.$0,(C$7"%'8?)?

N"3O0+58#N"3O0+58#N"3O0+58#N"3O0+58#VGG:)PPT?D)T?D)

3 I ;#(Q8)0%%.$0,(C$7"%'8?)?)8)0%%.$0,(C$7"%'8?)?)

N"3O0+5N"3O0+58#VF:)PU?D)

!,)&',-&.(/%)8W?D)

Start task 3.

47 © 2008–2018 by the MIT 6.172 Lecturers

Page 48: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in TBB

!"#$%&'()*$+,'"#,-

!"#$%&'()*".+,/(01-

!"#$%&'()2,334,0+5672

"#, 10"#8"#, 0/9$: $70/);0/9<=>?)@ "#,ABC,)/(+D "E)80/9$ * F?)@)/(,&/#)GD)H "#,ABC,)# I

+,/,.&%80/9<=G>:)JKLL:)M?D N"3O0+5P 0 I ;#(Q8,0+5RR0%%.$0,(C/..,8??

N"3O0+58#:)P/(+?D) ,0+5RR!"#$%&'(()&#%*&$#+)80?D

+,'RR$.&, **)2N"3.#0$$").E)2)**)#) ** 2)"+)2)**)/(+)**)+,'RR(#'%D

/(,&/#)MD H

&+"#9)#01(+S0$(),33D $%0++)N"3O0+5R)S&3%"$),0+5)@ S&3%"$R

$.#+, "#,ABC,)#D) "#,ABC,;)$.#+,)+&1D) N"3O0+58"#,ABC,)#C:)"#,ABC,;)+&1C?)R)

#8#C?:)+&18+&1C?)@H)

,0+5;)(T($&,(8?)@) "E8)#)* F ?)@

;+&1)I #D H)(%+()@

"#,ABC,)T:)UD) N"3O0+5P 0 I ;#(Q8)0%%.$0,(C$7"%'8?)?

N"3O0+58#VG:)PT?D) N"3O0+5P)3 I ;#(Q8)0%%.$0,(C$7"%'8?)?)

N"3O0+58#VF:)PU?D) !,)&',-&.(/%)8W?D) !"#$%83?D) !"#$%&#%*&$#+)&-('&#0080?D) ;+&1)I T)X)UD

H) /(,&/#)JKLLD)

H HD

!"#$%&'() $+,'"#,$+,'"#,-

!"#$%&'()*".+,/(01".+,/(01-

!"#$%&'()2,334,0+567,334,0+5672

"#, 10"#8"#, 0/9$0/9$: $70/);0/9<=>?)@=>?)@

"#,ABC,)/(+D

"E)80/9$ * F?)@)?)@)/(,&/#)GD)H

"#,ABC,)# I

+,/,.&%80/9<=+,/,.&%80/9<=G>:)JKLL:)M?D

N"3O0+5P 0 I ;I ;#(Q8,0+5RR0%%.$0,(C/..,8??RR0%%.$0,(C/..,8??

N"3O0+58#:)PP/(+?D)

,0+5RR!"#$%&'(()&#%*&$#+)!"#$%&'(()&#%*&$#+)80?D

+&1C?)R)

8)0%%.$0,(C$7"%'8?)?

N"3O0+58#N"3O0+58#VGG:)PPT?D)T?D)

8)0%%.$0,(C$7"%'8?)?)

8#VF:)PU?D)

Start task 0 and wait for both 0 and 3 to finish.

48 © 2008–2018 by the MIT 6.172 Lecturers

Page 49: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in TBB

!"#$%&'()*$+,'"#,-

!"#$%&'()*".+,/(01-

!"#$%&'()2,334,0+5672

"#, 10"#8"#, 0/9$: $70/);0/9<=>?)@ "#,ABC,)/(+D "E)80/9$ * F?)@)/(,&/#)GD)H "#,ABC,)# I

+,/,.&%80/9<=G>:)JKLL:)M?D N"3O0+5P 0 I ;#(Q8,0+5RR0%%.$0,(C/..,8??

N"3O0+58#:)P/(+?D) ,0+5RR!"#$%&'(()&#%*&$#+)80?D

+,'RR$.&, **)2N"3.#0$$").E)2)**)#) ** 2)"+)2)**)/(+)**)+,'RR(#'%D

/(,&/#)MD H

&+"#9)#01(+S0$(),33D $%0++)N"3O0+5R)S&3%"$),0+5)@ S&3%"$R

$.#+, "#,ABC,)#D) "#,ABC,;)$.#+,)+&1D) N"3O0+58"#,ABC,)#C:)"#,ABC,;)+&1C?)R)

#8#C?:)+&18+&1C?)@H)

,0+5;)(T($&,(8?)@) "E8)#)* F ?)@

;+&1)I #D H)(%+()@

"#,ABC,)T:)UD) N"3O0+5P 0 I ;#(Q8)0%%.$0,(C$7"%'8?)?

N"3O0+58#VG:)PT?D) N"3O0+5P)3 I ;#(Q8)0%%.$0,(C$7"%'8?)?)

N"3O0+58#VF:)PU?D) !,)&',-&.(/%)8W?D) !"#$%83?D) !"#$%&#%*&$#+)&-('&#0080?D) ;+&1)I T)X)UD

H) /(,&/#)JKLLD)

H HD

!"#$%&'()*$+,'"#,$+,'"#,-

!"#$%&'()*".+,/(01".+,/(01-

!"#$%&'()2,334,0+567,334,0+5672

"#, 10"#8"#, 0/9$0/9$: $70/);0/9<=>?)@=>?)@

"#,ABC,)/(+D

"E)80/9$ * F?)@)?)@)/(,&/#)GD)H

"#,ABC,)# I

+,/,.&%80/9<=+,/,.&%80/9<=G>:)JKLL:)M?D

N"3O0+5P 0 I ;I ;#(Q8,0+5RR0%%.$0,(C/..,8??RR0%%.$0,(C/..,8??

N"3O0+58#:)PP/(+?D)

,0+5RR!"#$%&'(()&#%*&$#+)!"#$%&'(()&#%*&$#+)80?D

+,'RR$.&, **)2N"3.#0$$").E)2)**)#)

2)"+)2)2)"+)2) +,'RR(#'%RR(#'%

S&3%"$),0+5)@

D)

:)"#,ABC,;)+&1C?)R)

#8#C?:)+&18+&1C?)@H)

#(Q8)0%%.$0,(C$7"%'8?)?8)0%%.$0,(C$7"%'8?)?

N"3O0+58#N"3O0+58#N"3O0+58#N"3O0+58# :)PPT?D)T?D)

#(Q8)0%%.$0,(C$7"%'8?)?)8)0%%.$0,(C$7"%'8?)?)

N"3O0+5N"3O0+58#VF:)PU?D)

W?D)

!"#$%&#%*&$#+)&-('&#0080?D)80?D)

Add the results together to produce the final output.

49 © 2008–2018 by the MIT 6.172 Lecturers

Page 50: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in TBB

!"#$%&'()*$+,'"#,-

!"#$%&'()*".+,/(01-

!"#$%&'()2,334,0+5672

"#, 10"#8"#, 0/9$: $70/);0/9<=>?)@ "#,ABC,)/(+D "E)80/9$ * F?)@)/(,&/#)GD)H "#,ABC,)# I

+,/,.&%80/9<=G>:)JKLL:)M?D N"3O0+5P 0 I ;#(Q8,0+5RR0%%.$0,(C/..,8??

N"3O0+58#:)P/(+?D) ,0+5RR!"#$%&'(()&#%*&$#+)80?D

+,'RR$.&, **)2N"3.#0$$").E)2)**)#) ** 2)"+)2)**)/(+)**)+,'RR(#'%D

/(,&/#)MD H

&+"#9)#01(+S0$(),33D $%0++)N"3O0+5R)S&3%"$),0+5)@ S&3%"$R

$.#+, "#,ABC,)#D) "#,ABC,;)$.#+,)+&1D) N"3O0+58"#,ABC,)#C:)"#,ABC,;)+&1C?)R)

#8#C?:)+&18+&1C?)@H)

,0+5;)(T($&,(8?)@) "E8)#)* F ?)@

;+&1)I #D H)(%+()@

"#,ABC,)T:)UD) N"3O0+5P 0 I ;#(Q8)0%%.$0,(C$7"%'8?)?

N"3O0+58#VG:)PT?D) N"3O0+5P)3 I ;#(Q8)0%%.$0,(C$7"%'8?)?)

N"3O0+58#VF:)PU?D) !,)&',-&.(/%)8W?D) !"#$%83?D) !"#$%&#%*&$#+)&-('&#0080?D) ;+&1)I T)X)UD

H) /(,&/#)JKLLD)

H HD

!"#$%&'()

!"#$%&'()

"#,

I ;#(Q8)0%%.$0,(C$7"%'8?)?8)0%%.$0,(C$7"%'8?)?8)0%%.$0,(C$7"%'8?)?

N"3O0+58#N"3O0+58#N"3O0+58#N"3O0+58#VGG:)PPT?D)T?D)

I ;#(Q8)0%%.$0,(C$7"%'8?)?)8)0%%.$0,(C$7"%'8?)?)

N"3O0+5N"3O0+58#VF:)PU?D)

8W?D)

Create root task; spawn and wait.

50 © 2008–2018 by the MIT 6.172 Lecturers

Page 51: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Other TBB Features

∙ TBB provides many C++ templates to expresscommon patterns simply, such as• parallel_for for loop parallelism,• parallel_reduce for data aggregation,• pipeline and filter for software pipelining.

∙ TBB provides concurrent container classes, whichallow multiple threads to safely access andupdate items in the container concurrently.

∙ TBB also provides a variety of mutual-exclusionlibrary functions, including locks and atomicupdates.

© 2008–2018 by the MIT 6.172 Lecturers 51

Page 52: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Outline

• Shared-Memory Hardware• Concurrency Platforms

• Pthreads (and WinAPI Threads)• Threading Building Blocks• OpenMP• Cilk

© 2008–2018 by the MIT 6.172 Lecturers 52

Page 53: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

OpenMP

∙ Specification by an industry consortium.∙ Several compilers available, both open-

source and proprietary, including GCC, ICC,Clang, and Visual Studio.

∙ Linguistic extensions to C/C++ and Fortran inthe form of compiler pragmas.

∙ Runs on top of native threads.∙ Supports loop parallelism, task parallelism,

and pipeline parallelism.

© 2008–2018 by the MIT 6.172 Lecturers 53

Page 54: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in OpenMP

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,* /0#1/"*"2*

3*0450 , !"#$%&#*67*82

9:/;<=; !"# $%&'(&)%*+,-./01 6 > '!()"?@+2

9:/;<=; !"# $%&'(&)%*+,-2/01 8 >*'!()"?.+2

9:/;<=; !"# $%&'3%4$ /0#1/"*)6 A*8+2

3 3

© 2008–2018 by the MIT 6.172 Lecturers 54

Page 55: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in OpenMP

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,* /0#1/"*"2*

3*0450 , !"#$%&#*67*82

9:/;<=; !"# $%&'(&)%*+,-./01 6 > '!()"?@+2

9:/;<=; !"# $%&'(&)%*+,-2/01 8 >*'!()"?.+2

9:/;<=; !"# $%&'3%4$ /0#1/"*)6 A*8+2

3 3

Compiler directive.

© 2008–2018 by the MIT 6.172 Lecturers 55 56

Page 56: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in OpenMP

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,* /0#1/"*"2*

3*0450 , !"#$%&#*67*82

9:/;<=; !"# $%&'(&)%*+,-./01 6 > '!()"?@+2

9:/;<=; !"# $%&'(&)%*+,-2/01 8 >*'!()"?.+2

9:/;<=; !"# $%&'3%4$ /0#1/"*)6 A*8+2

3 3

The following statement is an

independent task.

© 2008–2018 by the MIT 6.172 Lecturers 56

Page 57: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in OpenMP

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,* /0#1/"*"2*

3*0450 , !"#$%&#*67*82

9:/;<=; !"# $%&'(&)%*+,-./01 6 > '!()"?@+2

9:/;<=; !"# $%&'(&)%*+,-2/01 8 >*'!()"?.+2

9:/;<=; !"# $%&'3%4$ /0#1/"*)6 A*8+2

3 3

Sharing of memory is managed explicitly.

© 2008–2018 by the MIT 6.172 Lecturers 57

Page 58: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Fibonacci in OpenMP

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,* /0#1/"*"2*

3*0450 , !"#$%&#*67*82

9:/;<=; !"# $%&'(&)%*+,-./01 6 > '!()"?@+2

9:/;<=; !"# $%&'(&)%*+,-2/01 8 >*'!()"?.+2

9:/;<=; !"# $%&'3%4$ /0#1/"*)6 A*8+2

3 3 Wait for the two

tasks to complete before continuing.

© 2008–2018 by the MIT 6.172 Lecturers 58

Page 59: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Other OpenMP Features

∙ OpenMP provides many pragma directivesto express common patterns, such as• parallel for for loop parallelism,• reduction for data aggregation,• directives for scheduling and data sharing.

∙ OpenMP supplies a variety ofsynchronization constructs, such asbarriers, atomic updates, and mutual-exclusion (mutex) locks.

© 2008–2018 by the MIT 6.172 Lecturers 59

Page 60: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Outline

• Shared-Memory Hardware • Concurrency Platforms Pthreads (and WinAPI Threads) Threading Building Blocks OpenMP Cilk

© 2008–2018 by the MIT 6.172 Lecturers 60

Page 61: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Intel Cilk Plus

∙ The “Cilk” part is a small set of linguistic extensionsto C/C++ to support fork-join parallelism. (The“Plus” part supports vector parallelism.)

∙ Developed originally by Cilk Arts, an MIT spin-off,which was acquired by Intel in July 2009.

∙ Based on the award-winning Cilk multithreadedlanguage developed at MIT.

∙ Features a provably efficient work-stealing scheduler.∙ Provides a hyperobject library for parallelizing code

with global variables.∙ Ecosystem includes the Cilkscreen race detector and

Cilkview scalability analyzer.

© 2008–2018 by the MIT 6.172 Lecturers 61

Page 62: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Tapir/LLVM and Cilk

6.172 will be using the Tapir/LLVM compiler, which supports the Cilk subset of Cilk Plus. ● Tapir/LLVM was developed at MIT by Tao B. Schardl,

William Moses, and Charles Leiserson.● Tapir/LLVM generally produces better code relative

to its base compiler than other implementations ofCilk.

● Tapir/LLVM uses Intel’s Cilk Plus runtime system.● Tapir/LLVM also supports more general features,

such as the spawning of code blocks.

© 2008–2018 by the MIT 6.172 Lecturers 62

Page 63: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Nested Parallelism in Cilk

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*

/0#1/"*"2* 3*0450 ,

!"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2

3 3

!"#$%&#*!"#$%&#*"+*,*+*,*2*

67*82!"#$%&'()* '!()":;+2+2

+*,*The named child function may execute in parallel with the parent caller.

.+22***<*<*88+2+2+2+2

Control cannot passthis point until all spawned children have returned.

Cilk keywords grant permission for parallel execution. They do not command parallel execution.

© 2008–2018 by the MIT 6.172 Lecturers 63

Page 64: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Loop Parallelism in Cilk

Example: a11 a12 ! a1n a11 a21 ! an1

In-place a21 a22 ! a2n a12 a22 ! an2

matrix " " # " " " # "transpose an1 an2 ! ann a1n a2n ! ann

A AT

The iterations of a !"#$%&'( loopexecute in parallel.

!!"#$%#&'(")*$"+),-"./"$,0"1 !"#$%&'( )"*+ ",-./"0*./11"2/3

&'(/)"*+/4,5./40"./1142/3 6'78#9/+9:;/, <=">=4>. <=">=4>/, <=4>=">. <=4>=">/, +9:;.

? ?

© 2008–2018 by the MIT 6.172 Lecturers 64

Page 65: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Reducers in Cilk

Example: Parallel summation

!"#$%"&'()*"%(#!+ ,(-.!"#$%&'( /$"0 $,-.($1".(22$3(4

#!+(2,($.567$"08/9:';"9<#!+3.

!"#$%"&'()*"%(#!+ ,(-. 8*7 /$"0 $,-.($1".(22$3(4

#!+(2,($. 5 67$"08/9:';"9<#!+3.

=>?@A=ABCDE=CBAFGHDD/#!+<(!"#$%"&'()*"%<(-3. =>?@A=ABCI>JKCBABCDE=CB/#!+3. L$)MA8*7/$"0 $,-.($1".(22$3(4

BCDE=CBAN>CO/#!+3(2,($. 5 67$"08/9KP&(#!+($#(:8;"9<(BCDE=CBAN>CO/#!+33. =>?@A=AEQBCI>JKCBABCDE=CB/#!+3.

© 2008–2018 by the MIT 6.172 Lecturers 65

Page 66: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Reducers in Cilk

Reducers can be created for monoids (algebraic structures with an associative binary operation and an identity element)

Cilk has several predefined reducers (add, multiply, min, max, and, or, xor, etc.)

© 2008–2018 by the MIT 6.172 Lecturers 66

Page 67: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Serial Semantics Cilk source serial elision

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*

/0#1/"*"2* 3*0450 ,

!"#$%&#*67*82 6 9 !"#$%)*+,- '!()":;+2 8 9*'!()":.+2 !"#$%).-!2*** /0#1/"*)6 <*8+2

3 3

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*

/0#1/"*"2* 3*0450 ,

!"#$%&#*67*82 6 9 '!()":;+2 8 9*'!()":.+2

/0#1/"*)6 <*8+2 3

3

The serial elision of a Cilk program is always a legal interpretation of the program’s semantics.

Remember, Cilk keywords grant permission for parallel execution. They do not command parallel execution.

=>0'!"0 !"#$%&'( '?/ =>0'!"0 !"#$%)*+,-=>0'!"0 !"#$%).-!

To obtain the serial elision:

© 2008–2018 by the MIT 6.172 Lecturers 67

Page 68: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Scheduling

! The Cilk concurrencyplatform allows the programmer to express logical parallelism in an application.

! The Cilk scheduler mapsthe executing programonto the processor coresdynamically at runtime.

! Cilk’s work-stealingscheduling algorithm isprovably efficient.

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*

/0#1/"*"2* 3*0450 ,

!"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2

3 3

Memory I/O

$

P

$

P

$

P

$$ $$ $$

Network

© 2008–2018 by the MIT 6.172 Lecturers 68

Page 69: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Cilk Platform

© 2008–2018 by the MIT 6.172 Lecturers

Parallel performance

Cilk compiler

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*/0#1/"*"2*3* 0450 , !"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2

3 3 Cilk source

P! P P

Binary

Programinput

69

Page 70: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Serial Testing

Reliable single-threaded code

C/C++ compiler

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*/0#1/"*"2*3* 0450 , !"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2

3 3 Cilk source

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*/0#1/"*"2*3* 0450 , !"#$%&# 6*9 '!()":;+2 !"#$%&# 8 9*'!()":.+2 /0#1/"*)6 <*8+2

3 3 Serial elision

Binary

P Serial regression

tests

70 © 2008–2018 by the MIT 6.172 Lecturers

Page 71: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Alternative Serial Testing

The parallel programexecuting on one core should behave exactly the same as the execution of the serial elision.

© 2008–2018 by the MIT 6.172 Lecturers

Reliable single-threaded code

Cilk compiler

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*/0#1/"*"2*3* 0450 , !"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2

3 3 Cilk source

P

Binary

Serial regression

tests

71

Page 72: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Parallel Testing

Cilksan finds and localizes determinacy races.

© 2008–2018 by the MIT 6.172 Lecturers

Reliable multi-threaded code

Cilk compiler with Cilksan

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*/0#1/"*"2*3* 0450 , !"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2

3 3 Cilk source

Parallel regression

tests P

Binary

72

Page 73: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Scalability Analysis

Cilkscale analyzes how well your program will scale to larger machines.

© 2008–2018 by the MIT 6.172 Lecturers

Scalability report

Cilk compiler with Cilkscale

!"#$%&# '!()!"#$%&#*"+*,* !' )" -*.+*,*/0#1/"*"2*3* 0450 , !"#$%&#*67*82 6 9 !"#$%&'()* '!()":;+2 8 9*'!()":.+2 !"#$%&+*!2*** /0#1/"*)6 <*8+2

3 3 Cilk source

Program input

P

Binary

73

Page 74: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

Summary

• Processors today have multiple cores, andobtaining high performance requires parallelprogramming

• Programming directly on processor cores ispainful and error-prone.

• Cilk abstracts processor cores, handlessynchronization and communication protocols,and performs provably efficient load balancing.

• Project 2: Parallel screen saver using Cilk

© 2008–2018 by the MIT 6.172 Lecturers 74

Page 75: 01 L 6 Multicore Programming · Multicore Programming Julian Shun 1. Multicore Processors 2 Q Why do semicon-ductor vendors provide chips with multiple processor . cores? A Because

MIT OpenCourseWare https://ocw.mit.edu

6.172 Performance Engineering of Software Systems Fall 2018

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

75