in-class slides with activities
DESCRIPTION
In-class slides with activities for parallel merge sort module.TRANSCRIPT
![Page 1: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/1.jpg)
Parallel Algorithms
Sortingand more
![Page 2: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/2.jpg)
Keep hardware in mind
• When considering ‘parallel’ algorithms,– We have to have an understanding of the
hardware they will run on
– Sequential algorithms: we are doing this implicitly
![Page 3: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/3.jpg)
Creative use of processing power
• Lots of data = need for speed• ~20 years : parallel processing– Studying how to use multiple processors together– Really large and complex computations– Parallel processing was an active sub-field of CS
• Since 2005: the era of multicore is here– All computers will have >1 processing unit
![Page 4: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/4.jpg)
Traditional Computing Machine
• Von Neumann model:– The stored program computer
• What is this?– Abstractly, what does it look like?
![Page 5: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/5.jpg)
New twist: multiple control units
• It’s difficult to make the CPU any faster– To increase potential speed, add more CPUs– These CPUs are called cores
• Abstractly, what might this look like in these new machines?
![Page 6: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/6.jpg)
Shared memory model
• Multiple processors can access memory locations
• May not scale over time– As we increase the ‘cores’
![Page 7: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/7.jpg)
Other ‘parallel’ configurations:• Clusters of computers– Network connects them
![Page 8: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/8.jpg)
Other ‘parallel’ configurations• Massive data centers
![Page 9: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/9.jpg)
Clusters and data centers• Distributed memory model
![Page 10: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/10.jpg)
Algorithms• We will use term processor for the processing unit that
executes instructions
• When considering how to design algorithms for these architectures– Useful to start with a base theoretical model– Revise when implementing on different hardware with
software packages• Parallel computing course
– Also consider:• Memory location access by ‘competing’/’cooperating’ processors• Theoretical arrangement of the processors
![Page 11: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/11.jpg)
PRAM model
• Parallel Random Access Machine• Theoretical
• Abstractly, what does it look like?• How do processors access memory in this
PRAM model?
![Page 12: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/12.jpg)
PRAM model
• Why is using the PRAM model useful when studying algorithms?
![Page 13: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/13.jpg)
PRAM model
• Processors working in parallel– Each trying to access memory values– Memory value: what do we mean by this?
• When designing algorithms, we need to consider what type of memory access that algorithm requires
• How might our theoretical computer work when many reads and writes are happening at the same time?
![Page 14: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/14.jpg)
Designing algorithms
• With many algorithms, we’re moving data around– Sort, e.g. Others?
• Concurrent reads by multiple processors– Memory not changed, so no ‘conflicts’
• Exclusive writes (EW)– Design pseudocode so that any processor is exclusively
writing a data value into a memory location
![Page 15: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/15.jpg)
Designing Algorithms• Arranging the processors– Helpful for design of algorithm
• We can envision how it works• We can envision the data access pattern needed
– EREW, CREW (CRCW)
– Not how processors are necessarily arranged in practice• Although some machines have been
– What are some possible arrangements?– Why might these arrangements prove useful for design?
![Page 16: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/16.jpg)
Arrangements
![Page 17: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/17.jpg)
Sorting in Parallel
Emphasis: merge sort
![Page 18: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/18.jpg)
Sequential merge sort
• Recursive– Can envision
a recursion tree
function mergesort(m) var list left, right if length(m) ≤ 1 return m else middle = length(m) / 2
for each x in m up to middle add x to left
for each x in m after middle add x to right
left = mergesort(left)
right = mergesort(right)
result = merge(left, right)
return result
![Page 19: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/19.jpg)
Parallel merge sort• Shared data: 2 lists in memory• Sort pairs once in parallel• The processes merge concurrently
How might we write the pseudocode?
![Page 20: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/20.jpg)
Parallel merge sort• Shared data: 2 lists in memory• Sort pairs once in parallel• The processes merge concurrently
How might we write the pseudocode?
Numbering of processors starts with 0
s = 2while s <= N do in parallel N/s steps for proc i merge values from i*s to (s*i)+s -1 s = s*2
![Page 21: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/21.jpg)
Parallel Merge Sort
• Work through pseudocode with larger N
• Processor Arrangement: binary tree• Memory access: EREW
• What was the more practical implementation?
![Page 22: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/22.jpg)
Let’s try others
Different from sorting
![Page 23: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/23.jpg)
Activity: Sum N integers
• Suppose we have an array of N integers in memory
• We wish to sum them– Variant: create a running sum in a new array
• Devise a parallel algorithm for this– Assume PRAM to start– What processor arrangement did you use?– What memory access is required?
![Page 24: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/24.jpg)
Next Activity• Now suppose you need an algorithm for
multiplying a matrix by a vector
X =
Matrix A Vector X Result Vector
Devise a parallel algorithm for thisAssume PRAM to start Think about what each process will compute- there are optionsWhat processor arrangement did you use?What memory access is required?
![Page 25: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/25.jpg)
Matrix-Vector Multiplication• The matrix is assumed to be M x N. In other words:
– The matrix has M rows.– The matrix has N columns.– For example, a 3 x 2 matrix has 3 rows and 2 columns.
• In matrix-vector multiplication, if the matrix is M x N, then the vector must have a dimension, N.– In other words, the vector will have N entries.– If the matrix is 3 x 2, then the vector must be 3 dimensional.– This is usually stated as saying the matrix and vector must be
conformable.• Then, if the matrix and vector are conformable, the product of the matrix and the vector is a resultant vector that has a dimension of M.
(So, the result could be a different size than the original vector!)For example, if the matrix is 3 x 2, and the vector is 3 dimensional, the result of the multiplication would be a vector of 2 dimensions
![Page 26: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/26.jpg)
Matrix-Vector Multiplication
• Ways to do a parallel algorithm:– One row of matrix per processor– One element of matrix per processor• There is additional overhead involved why?
• What if number of rows M is larger than number of processors?
• Emerging theme: how to partition the data
![Page 27: In-class slides with activities](https://reader036.vdocuments.mx/reader036/viewer/2022062514/558ca62ed8b42a21548b478b/html5/thumbnails/27.jpg)
Expand on previous example
• Matrix – Matrix multiplication
=
X= ?