cs3383 unit 4: dynamic multithreaded algorithms€¦ · introductiontoparallelalgorithms...
TRANSCRIPT
![Page 1: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/1.jpg)
CS3383 Unit 4: dynamic multithreadedalgorithms
David Bremner
March 25, 2018
![Page 2: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/2.jpg)
Outline
Dynamic Multithreaded AlgorithmsFork-Join ModelSpan, Work, And ParallelismParallel LoopsSchedulingRace Conditions
![Page 3: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/3.jpg)
Contents
Dynamic Multithreaded AlgorithmsFork-Join ModelSpan, Work, And ParallelismParallel LoopsSchedulingRace Conditions
![Page 4: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/4.jpg)
Introduction to Parallel Algorithms
Dynamic Multithreading
▶ Also known as the fork-join model▶ Shared memory, multicore▶ Cormen et. al 3rd edition, Chapter 27
![Page 5: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/5.jpg)
Introduction to Parallel AlgorithmsDynamic Multithreading
▶ Also known as the fork-join model▶ Shared memory, multicore▶ Cormen et. al 3rd edition, Chapter 27
Nested Parallelism▶ Spawn a subroutine, carry on with other work.▶ Similar to fork in POSIX.
![Page 6: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/6.jpg)
Introduction to Parallel Algorithms
Nested Parallelism▶ Spawn a subroutine, carry on with other work.▶ Similar to fork in POSIX.
Parallel Loop
▶ iterations of a for loop can execute in parallel.▶ Like OpenMP
![Page 7: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/7.jpg)
Cilk+
▶ The multithreaded model is based on Cilk+, available in thelatest versions of gcc.
▶ Programmer specifies possible paralellism▶ Runtime system takes care of mapping to OS threads▶ Cilk+ contains several more features than our model, e.g.
parallel vector and array operations.▶ Similar primitives are available in java.util.concurrent
![Page 8: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/8.jpg)
Cilk+
▶ The multithreaded model is based on Cilk+, available in thelatest versions of gcc.
▶ Programmer specifies possible paralellism
▶ Runtime system takes care of mapping to OS threads▶ Cilk+ contains several more features than our model, e.g.
parallel vector and array operations.▶ Similar primitives are available in java.util.concurrent
![Page 9: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/9.jpg)
Cilk+
▶ The multithreaded model is based on Cilk+, available in thelatest versions of gcc.
▶ Programmer specifies possible paralellism▶ Runtime system takes care of mapping to OS threads
▶ Cilk+ contains several more features than our model, e.g.parallel vector and array operations.
▶ Similar primitives are available in java.util.concurrent
![Page 10: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/10.jpg)
Cilk+
▶ The multithreaded model is based on Cilk+, available in thelatest versions of gcc.
▶ Programmer specifies possible paralellism▶ Runtime system takes care of mapping to OS threads▶ Cilk+ contains several more features than our model, e.g.
parallel vector and array operations.
▶ Similar primitives are available in java.util.concurrent
![Page 11: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/11.jpg)
Cilk+
▶ The multithreaded model is based on Cilk+, available in thelatest versions of gcc.
▶ Programmer specifies possible paralellism▶ Runtime system takes care of mapping to OS threads▶ Cilk+ contains several more features than our model, e.g.
parallel vector and array operations.▶ Similar primitives are available in java.util.concurrent
![Page 12: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/12.jpg)
Writing parallel (pseudo)-codeKeywords
parallel Run the loop (potentially) concurrentlyspawn Run the procedure (potentially) concurrently
sync Wait for all spawned children to complete.
Serialization▶ remove keywords from parallel code yields correct serial code▶ Adding parallel keywords to correct serial code might break it
▶ missing sync▶ loop iterations not independent
![Page 13: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/13.jpg)
Writing parallel (pseudo)-codeKeywords
parallel Run the loop (potentially) concurrentlyspawn Run the procedure (potentially) concurrently
sync Wait for all spawned children to complete.
Serialization▶ remove keywords from parallel code yields correct serial code▶ Adding parallel keywords to correct serial code might break it
▶ missing sync▶ loop iterations not independent
![Page 14: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/14.jpg)
Writing parallel (pseudo)-codeKeywords
parallel Run the loop (potentially) concurrentlyspawn Run the procedure (potentially) concurrently
sync Wait for all spawned children to complete.
Serialization▶ remove keywords from parallel code yields correct serial code▶ Adding parallel keywords to correct serial code might break it
▶ missing sync
▶ loop iterations not independent
![Page 15: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/15.jpg)
Writing parallel (pseudo)-codeKeywords
parallel Run the loop (potentially) concurrentlyspawn Run the procedure (potentially) concurrently
sync Wait for all spawned children to complete.
Serialization▶ remove keywords from parallel code yields correct serial code▶ Adding parallel keywords to correct serial code might break it
▶ missing sync▶ loop iterations not independent
![Page 16: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/16.jpg)
Fibonacci Examplefunction Fib(𝑛)
if 𝑛 ≤ 1 thenreturn 𝑛
else𝑥 = Fib(𝑛 − 1)𝑦 = Fib(𝑛 − 2)
return 𝑥 + 𝑦end if
end function
▶ Code in C, Java, Clojure and Racket available from http://www.cs.unb.ca/~bremner/teaching/cs3383/examples
![Page 17: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/17.jpg)
Fibonacci Examplefunction Fib(𝑛)
if 𝑛 ≤ 1 thenreturn 𝑛
else𝑥 = spawn Fib(𝑛 − 1)𝑦 = Fib(𝑛 − 2)syncreturn 𝑥 + 𝑦
end ifend function
▶ Code in C, Java, Clojure and Racket available from http://www.cs.unb.ca/~bremner/teaching/cs3383/examples
![Page 18: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/18.jpg)
Contents
Dynamic Multithreaded AlgorithmsFork-Join ModelSpan, Work, And ParallelismParallel LoopsSchedulingRace Conditions
![Page 19: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/19.jpg)
Computation DAGStrandsSeq. inst. with no parallel, spawn, return from spawn, or sync.
function Fib(𝑛)if 𝑛 ≤ 1 then ▷
return 𝑛else
𝑥 = spawn Fib(𝑛 − 1)𝑦 = Fib(𝑛 − 2) ▷syncreturn 𝑥 + 𝑦 ▷
end ifend function
P-FIB(1) P-FIB(0)
P-FIB(3)
P-FIB(4)
P-FIB(1)
P-FIB(1)
P-FIB(0)
P-FIB(2)
P-FIB(2)
![Page 20: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/20.jpg)
Computation DAGStrandsSeq. inst. with no parallel, spawn, return from spawn, or sync.
nodes strandsdown edges spawn
up edges returnhorizontal edges sequentialcritical path longest path in DAG
span weighted length ofcritical path ≡ lowerbound on time
P-FIB(1) P-FIB(0)
P-FIB(3)
P-FIB(4)
P-FIB(1)
P-FIB(1)
P-FIB(0)
P-FIB(2)
P-FIB(2)
![Page 21: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/21.jpg)
Computation DAGStrandsSeq. inst. with no parallel, spawn, return from spawn, or sync.
nodes strandsdown edges spawnup edges return
horizontal edges sequentialcritical path longest path in DAG
span weighted length ofcritical path ≡ lowerbound on time
P-FIB(1) P-FIB(0)
P-FIB(3)
P-FIB(4)
P-FIB(1)
P-FIB(1)
P-FIB(0)
P-FIB(2)
P-FIB(2)
![Page 22: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/22.jpg)
Computation DAGStrandsSeq. inst. with no parallel, spawn, return from spawn, or sync.
nodes strandsdown edges spawnup edges return
horizontal edges sequential
critical path longest path in DAGspan weighted length of
critical path ≡ lowerbound on time
P-FIB(1) P-FIB(0)
P-FIB(3)
P-FIB(4)
P-FIB(1)
P-FIB(1)
P-FIB(0)
P-FIB(2)
P-FIB(2)
![Page 23: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/23.jpg)
Computation DAGStrandsSeq. inst. with no parallel, spawn, return from spawn, or sync.
nodes strandsdown edges spawnup edges return
horizontal edges sequentialcritical path longest path in DAG
span weighted length ofcritical path ≡ lowerbound on time
P-FIB(1) P-FIB(0)
P-FIB(3)
P-FIB(4)
P-FIB(1)
P-FIB(1)
P-FIB(0)
P-FIB(2)
P-FIB(2)
![Page 24: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/24.jpg)
Computation DAGStrandsSeq. inst. with no parallel, spawn, return from spawn, or sync.
nodes strandsdown edges spawnup edges return
horizontal edges sequentialcritical path longest path in DAG
span weighted length ofcritical path ≡ lowerbound on time
P-FIB(1) P-FIB(0)
P-FIB(3)
P-FIB(4)
P-FIB(1)
P-FIB(1)
P-FIB(0)
P-FIB(2)
P-FIB(2)
![Page 25: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/25.jpg)
Work and Speedup
𝑇1 Work, sequential time.
𝑇𝑝 Time on 𝑝 processors.
Work Law
𝑇𝑝 ≥ 𝑇1/𝑝speedup ∶= 𝑇1/𝑇𝑝 ≤ 𝑝
![Page 26: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/26.jpg)
Work and Speedup
𝑇1 Work, sequential time.𝑇𝑝 Time on 𝑝 processors.
Work Law
𝑇𝑝 ≥ 𝑇1/𝑝speedup ∶= 𝑇1/𝑇𝑝 ≤ 𝑝
![Page 27: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/27.jpg)
Work and Speedup
𝑇1 Work, sequential time.𝑇𝑝 Time on 𝑝 processors.
Work Law
𝑇𝑝 ≥ 𝑇1/𝑝speedup ∶= 𝑇1/𝑇𝑝 ≤ 𝑝
![Page 28: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/28.jpg)
Parallelism
𝑇𝑝 Time on 𝑝 processors.
𝑇∞ Span, time givenunlimited processors.
We could idle processors:
𝑇𝑝 ≥ 𝑇∞ (1)
Best possible speedup:
parallelism = 𝑇1/𝑇∞≥ 𝑇1/𝑇𝑝 = speedup
![Page 29: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/29.jpg)
Parallelism
𝑇𝑝 Time on 𝑝 processors.𝑇∞ Span, time given
unlimited processors.
We could idle processors:
𝑇𝑝 ≥ 𝑇∞ (1)
Best possible speedup:
parallelism = 𝑇1/𝑇∞≥ 𝑇1/𝑇𝑝 = speedup
![Page 30: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/30.jpg)
Parallelism
𝑇𝑝 Time on 𝑝 processors.𝑇∞ Span, time given
unlimited processors.
We could idle processors:
𝑇𝑝 ≥ 𝑇∞ (1)
Best possible speedup:
parallelism = 𝑇1/𝑇∞≥ 𝑇1/𝑇𝑝 = speedup
![Page 31: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/31.jpg)
Span and Parallelism Example
Assume strands are unit cost.▶ 𝑇1 = 17
▶ 𝑇∞ = 8▶ Parallelism = 2.125 for this
input size.
P-FIB(1) P-FIB(0)
P-FIB(3)
P-FIB(4)
P-FIB(1)
P-FIB(1)
P-FIB(0)
P-FIB(2)
P-FIB(2)
![Page 32: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/32.jpg)
Span and Parallelism Example
Assume strands are unit cost.▶ 𝑇1 = 17▶ 𝑇∞ = 8
▶ Parallelism = 2.125 for thisinput size.
P-FIB(1) P-FIB(0)
P-FIB(3)
P-FIB(4)
P-FIB(1)
P-FIB(1)
P-FIB(0)
P-FIB(2)
P-FIB(2)
![Page 33: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/33.jpg)
Span and Parallelism Example
Assume strands are unit cost.▶ 𝑇1 = 17▶ 𝑇∞ = 8▶ Parallelism = 2.125 for this
input size.P-FIB(1) P-FIB(0)
P-FIB(3)
P-FIB(4)
P-FIB(1)
P-FIB(1)
P-FIB(0)
P-FIB(2)
P-FIB(2)
![Page 34: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/34.jpg)
Composing span and workA
A
B
B
A‖B
A +B
series 𝑇∞(𝐴 + 𝐵) = 𝑇∞(𝐴) + 𝑇∞(𝐵)
parallel 𝑇∞(𝐴‖𝐵) = max(𝑇∞(𝐴), 𝑇∞(𝐵))series or parallel 𝑇1 = 𝑇1(𝐴) + 𝑇1(𝐵)
![Page 35: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/35.jpg)
Composing span and workA
A
B
B
A‖B
A +B
series 𝑇∞(𝐴 + 𝐵) = 𝑇∞(𝐴) + 𝑇∞(𝐵)parallel 𝑇∞(𝐴‖𝐵) = max(𝑇∞(𝐴), 𝑇∞(𝐵))
series or parallel 𝑇1 = 𝑇1(𝐴) + 𝑇1(𝐵)
![Page 36: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/36.jpg)
Composing span and workA
A
B
B
A‖B
A +B
series 𝑇∞(𝐴 + 𝐵) = 𝑇∞(𝐴) + 𝑇∞(𝐵)parallel 𝑇∞(𝐴‖𝐵) = max(𝑇∞(𝐴), 𝑇∞(𝐵))
series or parallel 𝑇1 = 𝑇1(𝐴) + 𝑇1(𝐵)
![Page 37: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/37.jpg)
Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.
𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)
Let 𝜙 ≈ 1.62 be the solution to
𝜙2 = 𝜙 + 1
We can show by induction (twice)that
𝑇 (𝑛) ∈ Θ(𝜙𝑛)
𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏
(I.H.)
Substitute the I.H.
𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)
= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)
≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large
= 𝑎𝜙𝑛 − 𝑏
![Page 38: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/38.jpg)
Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.
𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)
Let 𝜙 ≈ 1.62 be the solution to
𝜙2 = 𝜙 + 1
We can show by induction (twice)that
𝑇 (𝑛) ∈ Θ(𝜙𝑛)
𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏
(I.H.)
Substitute the I.H.
𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)
= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)
≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large
= 𝑎𝜙𝑛 − 𝑏
![Page 39: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/39.jpg)
Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.
𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)
Let 𝜙 ≈ 1.62 be the solution to
𝜙2 = 𝜙 + 1
We can show by induction (twice)that
𝑇 (𝑛) ∈ Θ(𝜙𝑛)
𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏
(I.H.)
Substitute the I.H.
𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)
= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)
≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large
= 𝑎𝜙𝑛 − 𝑏
![Page 40: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/40.jpg)
Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.
𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)
Let 𝜙 ≈ 1.62 be the solution to
𝜙2 = 𝜙 + 1
We can show by induction (twice)that
𝑇 (𝑛) ∈ Θ(𝜙𝑛)
𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏 (I.H.)
Substitute the I.H.
𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)
= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)
≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large
= 𝑎𝜙𝑛 − 𝑏
![Page 41: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/41.jpg)
Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.
𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)
Let 𝜙 ≈ 1.62 be the solution to
𝜙2 = 𝜙 + 1
We can show by induction (twice)that
𝑇 (𝑛) ∈ Θ(𝜙𝑛)
𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏 (I.H.)
Substitute the I.H.
𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)
= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)
≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large
= 𝑎𝜙𝑛 − 𝑏
![Page 42: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/42.jpg)
Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.
𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)
Let 𝜙 ≈ 1.62 be the solution to
𝜙2 = 𝜙 + 1
We can show by induction (twice)that
𝑇 (𝑛) ∈ Θ(𝜙𝑛)
𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏 (I.H.)
Substitute the I.H.
𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)
= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)
≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large
= 𝑎𝜙𝑛 − 𝑏
![Page 43: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/43.jpg)
Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.
𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)
Let 𝜙 ≈ 1.62 be the solution to
𝜙2 = 𝜙 + 1
We can show by induction (twice)that
𝑇 (𝑛) ∈ Θ(𝜙𝑛)
𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏 (I.H.)
Substitute the I.H.
𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)
= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)
≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large
= 𝑎𝜙𝑛 − 𝑏
![Page 44: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/44.jpg)
Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.
𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)
Let 𝜙 ≈ 1.62 be the solution to
𝜙2 = 𝜙 + 1
We can show by induction (twice)that
𝑇 (𝑛) ∈ Θ(𝜙𝑛)
𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏 (I.H.)
Substitute the I.H.
𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)
= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)
≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large
= 𝑎𝜙𝑛 − 𝑏
![Page 45: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/45.jpg)
Span and Parallelism of Fibonacci
𝑇∞(𝑛) = max(𝑇∞(𝑛 − 1), 𝑇∞(𝑛 − 2)) + Θ(1)= 𝑇∞(𝑛 − 1) + Θ(1)
Transforming to sum, we get
𝑇∞ ∈ Θ(𝑛)
parallelism = 𝑇1(𝑛)𝑇∞(𝑛)
= Θ (𝜙𝑛
𝑛)
▶ So an inefficient way to compute Fibonacci, but very parallel
![Page 46: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/46.jpg)
Span and Parallelism of Fibonacci
𝑇∞(𝑛) = max(𝑇∞(𝑛 − 1), 𝑇∞(𝑛 − 2)) + Θ(1)= 𝑇∞(𝑛 − 1) + Θ(1)
Transforming to sum, we get
𝑇∞ ∈ Θ(𝑛)
parallelism = 𝑇1(𝑛)𝑇∞(𝑛)
= Θ (𝜙𝑛
𝑛)
▶ So an inefficient way to compute Fibonacci, but very parallel
![Page 47: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/47.jpg)
Span and Parallelism of Fibonacci
𝑇∞(𝑛) = max(𝑇∞(𝑛 − 1), 𝑇∞(𝑛 − 2)) + Θ(1)= 𝑇∞(𝑛 − 1) + Θ(1)
Transforming to sum, we get
𝑇∞ ∈ Θ(𝑛)
parallelism = 𝑇1(𝑛)𝑇∞(𝑛)
= Θ (𝜙𝑛
𝑛)
▶ So an inefficient way to compute Fibonacci, but very parallel
![Page 48: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/48.jpg)
Span and Parallelism of Fibonacci
𝑇∞(𝑛) = max(𝑇∞(𝑛 − 1), 𝑇∞(𝑛 − 2)) + Θ(1)= 𝑇∞(𝑛 − 1) + Θ(1)
Transforming to sum, we get
𝑇∞ ∈ Θ(𝑛)
parallelism = 𝑇1(𝑛)𝑇∞(𝑛)
= Θ (𝜙𝑛
𝑛)
▶ So an inefficient way to compute Fibonacci, but very parallel
![Page 49: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/49.jpg)
Contents
Dynamic Multithreaded AlgorithmsFork-Join ModelSpan, Work, And ParallelismParallel LoopsSchedulingRace Conditions
![Page 50: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/50.jpg)
Parallel Loopsparallel for 𝑖 = 1 to 𝑛 do
statement...statement...
end for
▶ Run 𝑛 copies in parallel with local setting of 𝑖.
▶ Effectively 𝑛-way spawn▶ Can be implemented with spawn and sync▶ Span
𝑇∞(𝑛) = Θ(log 𝑛) + max𝑖
𝑇∞(iteration i)
![Page 51: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/51.jpg)
Parallel Loopsparallel for 𝑖 = 1 to 𝑛 do
statement...statement...
end for
▶ Run 𝑛 copies in parallel with local setting of 𝑖.▶ Effectively 𝑛-way spawn
▶ Can be implemented with spawn and sync▶ Span
𝑇∞(𝑛) = Θ(log 𝑛) + max𝑖
𝑇∞(iteration i)
![Page 52: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/52.jpg)
Parallel Loopsparallel for 𝑖 = 1 to 𝑛 do
statement...statement...
end for
▶ Run 𝑛 copies in parallel with local setting of 𝑖.▶ Effectively 𝑛-way spawn▶ Can be implemented with spawn and sync
▶ Span𝑇∞(𝑛) = Θ(log 𝑛) + max
𝑖𝑇∞(iteration i)
![Page 53: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/53.jpg)
Parallel Loopsparallel for 𝑖 = 1 to 𝑛 do
statement...statement...
end for
▶ Run 𝑛 copies in parallel with local setting of 𝑖.▶ Effectively 𝑛-way spawn▶ Can be implemented with spawn and sync▶ Span
𝑇∞(𝑛) = Θ(log 𝑛) + max𝑖
𝑇∞(iteration i)
![Page 54: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/54.jpg)
Parallel Matrix-Vector productTo compute 𝑦 = 𝐴𝑥, in parallel
𝑦𝑖 =𝑛
∑𝑗=1
𝑎𝑖𝑗𝑥𝑗
function RowMult(A,x,y,i)𝑦𝑖 = 0for 𝑗 = 1 to 𝑛 do
𝑦𝑖 = 𝑦𝑖 + 𝑎𝑖𝑗𝑥𝑗end for
end function
function Mat-Vec(𝐴, 𝑥, 𝑦)Let 𝑛 = rows(𝐴)parallel for 𝑖 = 1 to 𝑛 do
RowMult(A,x,y,i)end for
end function
𝑇1(𝑛) ∈ Θ(𝑛2) (serialization)𝑇∞(𝑛) = Θ(log(𝑛))⏟⏟⏟⏟⏟
parallel for
+ Θ(𝑛)⏟RowMult
![Page 55: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/55.jpg)
Parallel Matrix-Vector productTo compute 𝑦 = 𝐴𝑥, in parallel
𝑦𝑖 =𝑛
∑𝑗=1
𝑎𝑖𝑗𝑥𝑗
function RowMult(A,x,y,i)𝑦𝑖 = 0for 𝑗 = 1 to 𝑛 do
𝑦𝑖 = 𝑦𝑖 + 𝑎𝑖𝑗𝑥𝑗end for
end function
function Mat-Vec(𝐴, 𝑥, 𝑦)Let 𝑛 = rows(𝐴)parallel for 𝑖 = 1 to 𝑛 do
RowMult(A,x,y,i)end for
end function
𝑇1(𝑛) ∈ Θ(𝑛2) (serialization)𝑇∞(𝑛) = Θ(log(𝑛))⏟⏟⏟⏟⏟
parallel for
+ Θ(𝑛)⏟RowMult
![Page 56: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/56.jpg)
Parallel Matrix-Vector productTo compute 𝑦 = 𝐴𝑥, in parallel
𝑦𝑖 =𝑛
∑𝑗=1
𝑎𝑖𝑗𝑥𝑗
function RowMult(A,x,y,i)𝑦𝑖 = 0for 𝑗 = 1 to 𝑛 do
𝑦𝑖 = 𝑦𝑖 + 𝑎𝑖𝑗𝑥𝑗end for
end function
function Mat-Vec(𝐴, 𝑥, 𝑦)Let 𝑛 = rows(𝐴)parallel for 𝑖 = 1 to 𝑛 do
RowMult(A,x,y,i)end for
end function
𝑇1(𝑛) ∈ Θ(𝑛2) (serialization)𝑇∞(𝑛) = Θ(log(𝑛))⏟⏟⏟⏟⏟
parallel for
+ Θ(𝑛)⏟RowMult
![Page 57: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/57.jpg)
Parallel Matrix-Vector productTo compute 𝑦 = 𝐴𝑥, in parallel
𝑦𝑖 =𝑛
∑𝑗=1
𝑎𝑖𝑗𝑥𝑗
function RowMult(A,x,y,i)𝑦𝑖 = 0for 𝑗 = 1 to 𝑛 do
𝑦𝑖 = 𝑦𝑖 + 𝑎𝑖𝑗𝑥𝑗end for
end function
function Mat-Vec(𝐴, 𝑥, 𝑦)Let 𝑛 = rows(𝐴)parallel for 𝑖 = 1 to 𝑛 do
RowMult(A,x,y,i)end for
end function
𝑇1(𝑛) ∈ Θ(𝑛2) (serialization)𝑇∞(𝑛) = Θ(log(𝑛))⏟⏟⏟⏟⏟
parallel for
+ Θ(𝑛)⏟RowMult
![Page 58: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/58.jpg)
Parallel Matrix-Vector product
function RowMult(A,x,y,i)𝑦𝑖 = 0for 𝑗 = 1 to 𝑛 do
𝑦𝑖 = 𝑦𝑖 + 𝑎𝑖𝑗𝑥𝑗end for
end function
function Mat-Vec(𝐴, 𝑥, 𝑦)Let 𝑛 = rows(𝐴)parallel for 𝑖 = 1 to 𝑛 do
RowMult(A,x,y,i)end for
end function
▶ Why is RowMult not usingparallel for?
![Page 59: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/59.jpg)
Parallel Matrix-Vector product
function RowMult(A,x,y,i)𝑦𝑖 = 0for 𝑗 = 1 to 𝑛 do
𝑦𝑖 = 𝑦𝑖 + 𝑎𝑖𝑗𝑥𝑗end for
end function
function Mat-Vec(𝐴, 𝑥, 𝑦)Let 𝑛 = rows(𝐴)parallel for 𝑖 = 1 to 𝑛 do
RowMult(A,x,y,i)end for
end function
▶ Why is RowMult not usingparallel for?
![Page 60: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/60.jpg)
Divide and Conquer Matrix-Vector product
function MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑡)if 𝑓 == 𝑡 then
RowMult(A,x,y,f)else
𝑚 = ⌊(𝑓 + 𝑡)/2⌋spawn MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑚)MVDC(𝐴, 𝑥, 𝑦, 𝑚 + 1, 𝑡)sync
end ifend function
1,1 2,2 3,3 4,4 5,5 6,6 7,7 8,8
1,2 3,4 5,6 7,8
1,4 5,8
1,8
![Page 61: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/61.jpg)
Divide and Conquer Matrix-Vector product
function MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑡)if 𝑓 == 𝑡 then
RowMult(A,x,y,f)else
𝑚 = ⌊(𝑓 + 𝑡)/2⌋spawn MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑚)MVDC(𝐴, 𝑥, 𝑦, 𝑚 + 1, 𝑡)sync
end ifend function
▶ 𝑇∞(𝑛) = Θ(log 𝑛)(binary tree)
▶ Θ(𝑛) leaves (one perrow)
▶ Θ(𝑛) interior nodes(binary tree)
▶ 𝑇1(𝑛) = Θ(𝑛2)
![Page 62: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/62.jpg)
Divide and Conquer Matrix-Vector product
function MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑡)if 𝑓 == 𝑡 then
RowMult(A,x,y,f)else
𝑚 = ⌊(𝑓 + 𝑡)/2⌋spawn MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑚)MVDC(𝐴, 𝑥, 𝑦, 𝑚 + 1, 𝑡)sync
end ifend function
▶ 𝑇∞(𝑛) = Θ(log 𝑛)(binary tree)
▶ Θ(𝑛) leaves (one perrow)
▶ Θ(𝑛) interior nodes(binary tree)
▶ 𝑇1(𝑛) = Θ(𝑛2)
![Page 63: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/63.jpg)
Divide and Conquer Matrix-Vector product
function MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑡)if 𝑓 == 𝑡 then
RowMult(A,x,y,f)else
𝑚 = ⌊(𝑓 + 𝑡)/2⌋spawn MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑚)MVDC(𝐴, 𝑥, 𝑦, 𝑚 + 1, 𝑡)sync
end ifend function
▶ 𝑇∞(𝑛) = Θ(log 𝑛)(binary tree)
▶ Θ(𝑛) leaves (one perrow)
▶ Θ(𝑛) interior nodes(binary tree)
▶ 𝑇1(𝑛) = Θ(𝑛2)
![Page 64: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/64.jpg)
Divide and Conquer Matrix-Vector product
function MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑡)if 𝑓 == 𝑡 then
RowMult(A,x,y,f)else
𝑚 = ⌊(𝑓 + 𝑡)/2⌋spawn MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑚)MVDC(𝐴, 𝑥, 𝑦, 𝑚 + 1, 𝑡)sync
end ifend function
▶ 𝑇∞(𝑛) = Θ(log 𝑛)(binary tree)
▶ Θ(𝑛) leaves (one perrow)
▶ Θ(𝑛) interior nodes(binary tree)
▶ 𝑇1(𝑛) = Θ(𝑛2)
![Page 65: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/65.jpg)
Contents
Dynamic Multithreaded AlgorithmsFork-Join ModelSpan, Work, And ParallelismParallel LoopsSchedulingRace Conditions
![Page 66: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/66.jpg)
SchedulingScheduling Problem
Abstractly Mapping threads to processorsPragmatically Mapping logical threads to a thread pool.
Ideal SchedulerOn-Line No advance knowledge of when threads will spawn or
complete.Distributed No central controller.
▶ to simplify analysis, we relax the second condition
![Page 67: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/67.jpg)
SchedulingScheduling Problem
Abstractly Mapping threads to processorsPragmatically Mapping logical threads to a thread pool.
Ideal SchedulerOn-Line No advance knowledge of when threads will spawn or
complete.Distributed No central controller.
▶ to simplify analysis, we relax the second condition
![Page 68: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/68.jpg)
SchedulingScheduling Problem
Abstractly Mapping threads to processorsPragmatically Mapping logical threads to a thread pool.
Ideal SchedulerOn-Line No advance knowledge of when threads will spawn or
complete.Distributed No central controller.
▶ to simplify analysis, we relax the second condition
![Page 69: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/69.jpg)
A greedy centralized schedulerMaintain a ready queue of strands ready to run.
Scheduling Step
Complete Step If ≥ 𝑝 (# processors) strands are ready, assign 𝑝strands to processors.
Incomplete Step Otherwise, assign all waiting strands to processors
▶ To simplify analysis, split any non-unit strands into a chain ofunit strands
▶ Therefore, after one time step, we schedule again.
![Page 70: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/70.jpg)
A greedy centralized schedulerMaintain a ready queue of strands ready to run.
Scheduling Step
Complete Step If ≥ 𝑝 (# processors) strands are ready, assign 𝑝strands to processors.
Incomplete Step Otherwise, assign all waiting strands to processors
▶ To simplify analysis, split any non-unit strands into a chain ofunit strands
▶ Therefore, after one time step, we schedule again.
![Page 71: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/71.jpg)
A greedy centralized schedulerMaintain a ready queue of strands ready to run.
Scheduling Step
Complete Step If ≥ 𝑝 (# processors) strands are ready, assign 𝑝strands to processors.
Incomplete Step Otherwise, assign all waiting strands to processors
▶ To simplify analysis, split any non-unit strands into a chain ofunit strands
▶ Therefore, after one time step, we schedule again.
![Page 72: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/72.jpg)
A greedy centralized schedulerMaintain a ready queue of strands ready to run.
Scheduling Step
Complete Step If ≥ 𝑝 (# processors) strands are ready, assign 𝑝strands to processors.
Incomplete Step Otherwise, assign all waiting strands to processors
▶ To simplify analysis, split any non-unit strands into a chain ofunit strands
▶ Therefore, after one time step, we schedule again.
![Page 73: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/73.jpg)
Optimal and Approximate SchedulingRecall
𝑇𝑝 ≥ 𝑇1/𝑝 (work law)𝑇𝑝 ≥ 𝑇∞ (span)
Therefore
𝑇𝑝 ≥ max(𝑇1/𝑝, 𝑇∞) = opt
With the greedy algorithm we can achieve
𝑇𝑝 ≤ 𝑇1𝑝
+ 𝑇∞ ≤ 2 max(𝑇1/𝑝, 𝑇∞) = 2 × opt
![Page 74: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/74.jpg)
Optimal and Approximate SchedulingRecall
𝑇𝑝 ≥ 𝑇1/𝑝 (work law)𝑇𝑝 ≥ 𝑇∞ (span)
Therefore
𝑇𝑝 ≥ max(𝑇1/𝑝, 𝑇∞) = opt
With the greedy algorithm we can achieve
𝑇𝑝 ≤ 𝑇1𝑝
+ 𝑇∞ ≤ 2 max(𝑇1/𝑝, 𝑇∞) = 2 × opt
![Page 75: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/75.jpg)
Counting Complete Steps
▶ Let 𝑘 be the number of complete steps.
▶ At each complete step we do 𝑝 units of work.▶ Every unit of work corresponds to one step of the serialization,
so 𝑘𝑝 ≤ 𝑇1.▶ Therefore 𝑘 ≤ 𝑇1/𝑝
![Page 76: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/76.jpg)
Counting Complete Steps
▶ Let 𝑘 be the number of complete steps.▶ At each complete step we do 𝑝 units of work.
▶ Every unit of work corresponds to one step of the serialization,so 𝑘𝑝 ≤ 𝑇1.
▶ Therefore 𝑘 ≤ 𝑇1/𝑝
![Page 77: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/77.jpg)
Counting Complete Steps
▶ Let 𝑘 be the number of complete steps.▶ At each complete step we do 𝑝 units of work.▶ Every unit of work corresponds to one step of the serialization,
so 𝑘𝑝 ≤ 𝑇1.
▶ Therefore 𝑘 ≤ 𝑇1/𝑝
![Page 78: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/78.jpg)
Counting Complete Steps
▶ Let 𝑘 be the number of complete steps.▶ At each complete step we do 𝑝 units of work.▶ Every unit of work corresponds to one step of the serialization,
so 𝑘𝑝 ≤ 𝑇1.▶ Therefore 𝑘 ≤ 𝑇1/𝑝
![Page 79: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/79.jpg)
Counting Incomplete Steps▶ Let 𝐺 be the DAG of remaining
strands.
▶ The ready queue of strands isexactly the set of sources in 𝐺
▶ In incomplete step runs all sourcesin 𝐺
▶ Every longest path starts at asource (otherwise, extend)
▶ After an incomplete step, length oflongest path shrinks by 1
▶ There can be at most 𝑇∞ steps.
![Page 80: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/80.jpg)
Counting Incomplete Steps▶ Let 𝐺 be the DAG of remaining
strands.▶ The ready queue of strands is
exactly the set of sources in 𝐺
▶ In incomplete step runs all sourcesin 𝐺
▶ Every longest path starts at asource (otherwise, extend)
▶ After an incomplete step, length oflongest path shrinks by 1
▶ There can be at most 𝑇∞ steps.
![Page 81: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/81.jpg)
Counting Incomplete Steps▶ Let 𝐺 be the DAG of remaining
strands.▶ The ready queue of strands is
exactly the set of sources in 𝐺▶ In incomplete step runs all sources
in 𝐺
▶ Every longest path starts at asource (otherwise, extend)
▶ After an incomplete step, length oflongest path shrinks by 1
▶ There can be at most 𝑇∞ steps.
![Page 82: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/82.jpg)
Counting Incomplete Steps▶ Let 𝐺 be the DAG of remaining
strands.▶ The ready queue of strands is
exactly the set of sources in 𝐺▶ In incomplete step runs all sources
in 𝐺▶ Every longest path starts at a
source (otherwise, extend)
▶ After an incomplete step, length oflongest path shrinks by 1
▶ There can be at most 𝑇∞ steps.
![Page 83: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/83.jpg)
Counting Incomplete Steps▶ Let 𝐺 be the DAG of remaining
strands.▶ The ready queue of strands is
exactly the set of sources in 𝐺▶ In incomplete step runs all sources
in 𝐺▶ Every longest path starts at a
source (otherwise, extend)▶ After an incomplete step, length of
longest path shrinks by 1
▶ There can be at most 𝑇∞ steps.
![Page 84: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/84.jpg)
Counting Incomplete Steps▶ Let 𝐺 be the DAG of remaining
strands.▶ The ready queue of strands is
exactly the set of sources in 𝐺▶ In incomplete step runs all sources
in 𝐺▶ Every longest path starts at a
source (otherwise, extend)▶ After an incomplete step, length of
longest path shrinks by 1▶ There can be at most 𝑇∞ steps.
![Page 85: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/85.jpg)
Parallel Slackness
parallel slackness = parallelism𝑝
= 𝑇1𝑝𝑇∞
speedup = 𝑇1𝑇𝑝
≤ 𝑇1𝑇∞
= 𝑝 × slackness
▶ If slackness < 1, speedup < 𝑝
▶ If slackness ≥ 1, linear speedup achievable for given number ofprocessors
![Page 86: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/86.jpg)
Parallel Slackness
parallel slackness = parallelism𝑝
= 𝑇1𝑝𝑇∞
speedup = 𝑇1𝑇𝑝
≤ 𝑇1𝑇∞
= 𝑝 × slackness
▶ If slackness < 1, speedup < 𝑝▶ If slackness ≥ 1, linear speedup achievable for given number of
processors
![Page 87: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/87.jpg)
Slackness and Schedulingslackness ∶= 𝑇1
𝑝 × 𝑇∞
TheoremFor sufficiently large slackness,the greed scheduler approachestime 𝑇1/𝑝.
Suppose
𝑇1𝑝 × 𝑇∞
≥ 𝑐
Then𝑇∞ ≤ 𝑇1
𝑐𝑝(2)
Recall that with the greedyscheduler,
𝑇𝑝 ≤ (𝑇1𝑝
+ 𝑇∞)
Substituting (2), we have
𝑇𝑝 ≤ 𝑇1𝑝
(1 + 1𝑐)
![Page 88: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/88.jpg)
Slackness and Schedulingslackness ∶= 𝑇1
𝑝 × 𝑇∞
TheoremFor sufficiently large slackness,the greed scheduler approachestime 𝑇1/𝑝.
Suppose
𝑇1𝑝 × 𝑇∞
≥ 𝑐
Then𝑇∞ ≤ 𝑇1
𝑐𝑝(2)
Recall that with the greedyscheduler,
𝑇𝑝 ≤ (𝑇1𝑝
+ 𝑇∞)
Substituting (2), we have
𝑇𝑝 ≤ 𝑇1𝑝
(1 + 1𝑐)
![Page 89: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/89.jpg)
Slackness and Schedulingslackness ∶= 𝑇1
𝑝 × 𝑇∞
TheoremFor sufficiently large slackness,the greed scheduler approachestime 𝑇1/𝑝.
Suppose
𝑇1𝑝 × 𝑇∞
≥ 𝑐
Then𝑇∞ ≤ 𝑇1
𝑐𝑝(2)
Recall that with the greedyscheduler,
𝑇𝑝 ≤ (𝑇1𝑝
+ 𝑇∞)
Substituting (2), we have
𝑇𝑝 ≤ 𝑇1𝑝
(1 + 1𝑐)
![Page 90: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/90.jpg)
Slackness and Schedulingslackness ∶= 𝑇1
𝑝 × 𝑇∞
TheoremFor sufficiently large slackness,the greed scheduler approachestime 𝑇1/𝑝.
Suppose
𝑇1𝑝 × 𝑇∞
≥ 𝑐
Then𝑇∞ ≤ 𝑇1
𝑐𝑝(2)
Recall that with the greedyscheduler,
𝑇𝑝 ≤ (𝑇1𝑝
+ 𝑇∞)
Substituting (2), we have
𝑇𝑝 ≤ 𝑇1𝑝
(1 + 1𝑐)
![Page 91: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/91.jpg)
Slackness and Schedulingslackness ∶= 𝑇1
𝑝 × 𝑇∞
TheoremFor sufficiently large slackness,the greed scheduler approachestime 𝑇1/𝑝.
Suppose
𝑇1𝑝 × 𝑇∞
≥ 𝑐
Then𝑇∞ ≤ 𝑇1
𝑐𝑝(2)
Recall that with the greedyscheduler,
𝑇𝑝 ≤ (𝑇1𝑝
+ 𝑇∞)
Substituting (2), we have
𝑇𝑝 ≤ 𝑇1𝑝
(1 + 1𝑐)
![Page 92: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/92.jpg)
Contents
Dynamic Multithreaded AlgorithmsFork-Join ModelSpan, Work, And ParallelismParallel LoopsSchedulingRace Conditions
![Page 93: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/93.jpg)
Race ConditionsNon-Determinism
▶ result varies from run to run▶ sometimes OK (in certain randomized algorithms)▶ mostly a bug.
Example
x = 0parallel for i ← 1 to 2 do
x ← x + 1
▶ This is nondeterministic unless incrementing x is atomic
![Page 94: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/94.jpg)
Race ConditionsNon-Determinism
▶ result varies from run to run▶ sometimes OK (in certain randomized algorithms)▶ mostly a bug.
Example
x = 0parallel for i ← 1 to 2 do
x ← x + 1
▶ This is nondeterministic unless incrementing x is atomic
![Page 95: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/95.jpg)
Racy execution
𝑥 = 0
𝑟1 ← 𝑥𝑟2 ← 𝑥
incr 𝑟1incr 𝑟2
𝑥 ← 𝑟1𝑥 ← 𝑟2
print 𝑥
▶ all possible topological sorts arevalid execution orders
▶ In particular it’s not hard for bothloads to complete before eitherstore
▶ In practice there are varioussynchronization strategies (locks,etc…).
▶ Here we will insist that parallelstrands are independent
![Page 96: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/96.jpg)
Racy execution
𝑥 = 0
𝑟1 ← 𝑥𝑟2 ← 𝑥
incr 𝑟1incr 𝑟2
𝑥 ← 𝑟1𝑥 ← 𝑟2
print 𝑥
▶ all possible topological sorts arevalid execution orders
▶ In particular it’s not hard for bothloads to complete before eitherstore
▶ In practice there are varioussynchronization strategies (locks,etc…).
▶ Here we will insist that parallelstrands are independent
![Page 97: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/97.jpg)
Racy execution
𝑥 = 0
𝑟1 ← 𝑥𝑟2 ← 𝑥
incr 𝑟1incr 𝑟2
𝑥 ← 𝑟1𝑥 ← 𝑟2
print 𝑥
▶ all possible topological sorts arevalid execution orders
▶ In particular it’s not hard for bothloads to complete before eitherstore
▶ In practice there are varioussynchronization strategies (locks,etc…).
▶ Here we will insist that parallelstrands are independent
![Page 98: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/98.jpg)
Racy execution
𝑥 = 0
𝑟1 ← 𝑥𝑟2 ← 𝑥
incr 𝑟1incr 𝑟2
𝑥 ← 𝑟1𝑥 ← 𝑟2
print 𝑥
▶ all possible topological sorts arevalid execution orders
▶ In particular it’s not hard for bothloads to complete before eitherstore
▶ In practice there are varioussynchronization strategies (locks,etc…).
▶ Here we will insist that parallelstrands are independent
![Page 99: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/99.jpg)
We can write bad code with spawn too
sum(i, j)if (i>j)
return;if (i==j)
x++;else
m=(i+j)/2;spawn sum(i,m);sum(m+1,j);sync;
▶ here we have the samenon-deterministic interleaving ofreading and writing 𝑥
▶ the style is a bit unnatural, inparticular we are not using thereturn value of spawn at all.
![Page 100: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/100.jpg)
Being more functional helps
sum(i, j)if (i>j) return 0;if (i==j) return 1;
m ← (i+j)/2;
left ← spawn sum(i,m);right ← sum(m+1,j);sync;return left + right;
▶ each strand writes intodifferent variables
▶ sync is used as a barrier toserialize
![Page 101: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/101.jpg)
Being more functional helps
sum(i, j)if (i>j) return 0;if (i==j) return 1;
m ← (i+j)/2;
left ← spawn sum(i,m);right ← sum(m+1,j);sync;return left + right;
▶ each strand writes intodifferent variables
▶ sync is used as a barrier toserialize
![Page 102: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/102.jpg)
Single Writer races
x ← spawn foo(x)y ← foo(x)sync
▶ arguments to spawnedroutines are evaluated in theparent context
▶ but this isn’t enough to berace free.
▶ which value 𝑥 is passed tothe second call of ’foo’depends how long the firstone takes.
![Page 103: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/103.jpg)
Single Writer races
x ← spawn foo(x)y ← foo(x)sync
▶ arguments to spawnedroutines are evaluated in theparent context
▶ but this isn’t enough to berace free.
▶ which value 𝑥 is passed tothe second call of ’foo’depends how long the firstone takes.
![Page 104: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27](https://reader033.vdocuments.mx/reader033/viewer/2022052100/6039c2ab929aef214237bc0e/html5/thumbnails/104.jpg)
Single Writer races
x ← spawn foo(x)y ← foo(x)sync
▶ arguments to spawnedroutines are evaluated in theparent context
▶ but this isn’t enough to berace free.
▶ which value 𝑥 is passed tothe second call of ’foo’depends how long the firstone takes.