cs 472 concurrent & parallel programming university of...
TRANSCRIPT
![Page 1: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/1.jpg)
Lecture 8 – Collective Pattern
Collectives PatternCS 472
Concurrent & Parallel ProgrammingUniversity of Evansville
Selection of slides from CIS 410/510 Introduction to Parallel Computing
Department of Computer and Information Science, University of Oregon
http://ipcc.cs.uoregon.edu/curriculum.html
![Page 2: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/2.jpg)
Lecture 8 – Collective Pattern
Announcements
• No class next Thursday, September 21. Instructor will be traveling to a conference. It will be the second day of working on the lab project associated with today’s topic.
CS 472 Concurrent & Parallel Programming Lecture 5 Collectives
2
![Page 3: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/3.jpg)
Lecture 8 – Collective Pattern
Collectives
• Collective operations deal with a collection of data as a whole, rather than as separate elements
• Collective patterns include: • Reduce
• Scan
• Partition
• Scatter
• Gather
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives3
![Page 4: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/4.jpg)
Lecture 8 – Collective Pattern
Collectives
• Collective operations deal with a collection of data as a whole, rather than as separate elements
• Collective patterns include: • Reduce
• Scan
• Partition
• Scatter
• Gather
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives4
Reduce and Scan will be covered in this lecture
![Page 5: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/5.jpg)
Lecture 8 – Collective Pattern
Reduce
• Reduce is used to combine a collection of elements into one summary value
• A combiner function combines elements pairwise
• A combiner function only needs to be associative to be parallelizable
• Example combiner functions:• Addition
• Multiplication
• Maximum / Minimum
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives5
![Page 6: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/6.jpg)
Lecture 8 – Collective Pattern
Reduce
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives6
Serial Reduction Parallel Reduction
![Page 7: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/7.jpg)
Lecture 8 – Collective Pattern
Reduce
• Vectorization
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives7
![Page 8: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/8.jpg)
Lecture 8 – Collective Pattern
Reduce
• Tiling is used to break chunks of work up for workers to reduce serially
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives8
![Page 9: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/9.jpg)
Lecture 8 – Collective Pattern
Reduce – Add Example
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives9
1 2 45 9 7 0 1
![Page 10: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/10.jpg)
Lecture 8 – Collective Pattern
Reduce – Add Example
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives10
1 2 45 9 7 0 1
28
12
3
8
21
29
28
29
![Page 11: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/11.jpg)
Lecture 8 – Collective Pattern
Reduce – Add Example
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives11
1 2 45 9 7 0 1
![Page 12: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/12.jpg)
Lecture 8 – Collective Pattern
Reduce – Add Example
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives12
1 2 45 9 7 0 1
3 9 116
12 17
29
29
![Page 13: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/13.jpg)
Lecture 8 – Collective Pattern
Reduce
• We can “fuse” the map and reduce patterns
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives13
![Page 14: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/14.jpg)
Lecture 8 – Collective Pattern
Reduce
• Precision can become a problem with reductions on floating point data
• Different orderings of floating point data can change the reduction value
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives14
![Page 15: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/15.jpg)
Lecture 8 – Collective Pattern
Reduce Example: Dot Product
• 2 vectors of same length
• Map (*) to multiply the components
• Then reduce with (+) to get the final answer
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives15
Also:
![Page 16: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/16.jpg)
Lecture 8 – Collective Pattern
Dot Product – Example Uses• Essential operation in physics, graphics, video games,…
• Gaming analogy: in Mario Kart, there are “boost pads” on the ground that increase your speed• red vector is your speed (x and y direction)
• blue vector is the orientation of the boost pad (x and y direction). Larger numbers are more power.
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives16
Photo source
How much boost will you get? For the analogy, imagine the pad multiplies your speed:• If you come in going 0, you’ll get nothing• If you cross the pad perpendicularly, you’ll
get 0 [just like the banana obliteration, it will give you 0x boost in the perpendicular direction]
Ref: http://betterexplained.com/articles/vector-calculus-understanding-the-dot-product/
![Page 17: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/17.jpg)
Lecture 8 – Collective Pattern
Dot Product Code Examples
• Dot product code examples available on csserver in directory /home/hwang/cs472/dotproduct
CS 472 Concurrent & Parallel Programming Lecture 5 Collectives
17
![Page 18: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/18.jpg)
Lecture 8 – Collective Pattern
Scan
• The scan pattern produces partial reductions of input sequence, generates new sequence
• Trickier to parallelize than reduce
• Inclusive scan vs. exclusive scan• Inclusive scan: includes current element in partial
reduction
• Exclusive scan: excludes current element in partial reduction, partial reduction is of all prior elements prior to current element
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives18
![Page 19: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/19.jpg)
Lecture 8 – Collective Pattern
Scan – Example Uses
• Lexical comparison of strings – e.g., determine that “strategy” should appear before “stratification” in a dictionary
• Add multi-precision numbers (those that cannot be represented in a single machine word)
• Evaluate polynomials
• Implement radix sort or quicksort
• Delete marked elements in an array
• Dynamically allocate processors
• Lexical analysis – parsing programs into tokens
• Searching for regular expressions
• Labeling components in 2-D images
• Some tree algorithms – e.g., finding the depth of every vertex in a tree
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives19
![Page 20: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/20.jpg)
Lecture 8 – Collective Pattern
Scan
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives20
Serial Scan
Parallel Scan
![Page 21: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/21.jpg)
Lecture 8 – Collective Pattern
Scan
• One algorithm for parallelizing scan is to perform an “up sweep” and a “down sweep”
• Reduce the input on the up sweep
• The down sweep produces the intermediate results
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives21
Up sweep – compute reduction
Down sweep – compute intermediate values
![Page 22: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/22.jpg)
Lecture 8 – Collective Pattern
Scan – Maximum Example
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives22
1 2 7 2 4 34 0
1 2 7 2 4 34 0
![Page 23: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/23.jpg)
Lecture 8 – Collective Pattern
Scan – Maximum Example
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives23
1 2 7 2 4 34 0
1 2 7 2 4 34 0
1 1
4
4
4
44
4
4
4
44
42
4
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
4
777
![Page 24: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/24.jpg)
Lecture 8 – Collective Pattern
Scan
• Three phase scan with tiling
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives24
![Page 25: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/25.jpg)
Lecture 8 – Collective Pattern
Scan
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives25
![Page 26: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/26.jpg)
Lecture 8 – Collective Pattern
Scan
• Just like reduce, we can also fuse the map pattern with the scan pattern
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives26
![Page 27: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/27.jpg)
Lecture 8 – Collective Pattern
Scan
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives27
![Page 28: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/28.jpg)
Lecture 8 – Collective Pattern
Merge Sort as a reduction
• We can sort an array via a pair of a map and a reduce
• Map each element into a vector containing just that element
• <> is the merge operation: [1,3,5,7] <> [2,6,15] = [1,2,3,5,6,7,15]
• [] is the empty list
• How fast is this?
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives28
![Page 29: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/29.jpg)
Lecture 8 – Collective Pattern
Right Biased Sort
Start with [14,3,4,8,7,52,1]
Map to [[14],[3],[4],[8],[7],[52],[1]]
Reduce:
[14] <> ([3] <> ([4] <> ([8] <> ([7] <> ([52] <> [1])))))= [14] <> ([3] <> ([4] <> ([8] <> ([7] <> [1,52]))))
= [14] <> ([3] <> ([4] <> ([8] <> [1,7,52])))
= [14] <> ([3] <> ([4] <> [1,7,8,52]))
= [14] <> ([3] <> [1,4,7,8,52])
= [14] <> [1,3,4,7,8,52]
= [1,3,4,7,8,14,52]
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives29
![Page 30: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/30.jpg)
Lecture 8 – Collective Pattern
Right Biased Sort Continued
• How long did that take?
• We did O(n) merges…but each one took O(n) time
• O(n2)
• We wanted merge sort, but instead we got insertion sort!
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives30
![Page 31: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/31.jpg)
Lecture 8 – Collective Pattern
Tree Shape Sort
Start with [14,3,4,8,7,52,1]
Map to [[14],[3],[4],[8],[7],[52],[1]]
Reduce:(([14] <> [3]) <> ([4] <> [8])) <> (([7] <> [52]) <> [1])
= ([3,14] <> [4,8]) <> ([7,52] <> [1])
= [3,4,8,14] <> [1,7,52]
= [1,3,4,7,8,14,52]
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives31
![Page 32: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture](https://reader034.vdocuments.mx/reader034/viewer/2022042320/5f097f837e708231d4271c9d/html5/thumbnails/32.jpg)
Lecture 8 – Collective Pattern
Tree Shaped Sort Performance
• Even if we only had a single processor this is better• We do O(log n) merges
• Each one is O(n)
• So O(n*log(n))
• But opportunity for parallelism is not so great• O(n) assuming sequential merge
• Takeaway: the shape of reduction matters!
CS 472 Concurrent & Parallel Programming
Lecture 5 Collectives32