chapter 3 parallel algorithm design parallel programming · mpi and openmp programming model mpi:...

36
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 3 Parallel Algorithm Design Parallel Programming in C with MPI and OpenMP Michael J. Quinn Wonyong Sung Modification

Upload: others

Post on 13-May-2021

50 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Chapter 3 Parallel Algorithm Design

Parallel Programmingin C with MPI and OpenMP

Michael J. Quinn Wonyong Sung Modification

Page 2: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Parallel Programming

Load balancingBest with data-level parallel processing

Low communication overheadArchitecture dependent

Precedence relation and schedulingMemory/cache/IO consideration

Page 3: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

PartitioningDividing both computation and data into piecesDomain (x-axis, data) decomposition

Divide data into pieces – in many case best partitioningSPMD (Single Processor Multiple Data) paradigmGood for massively parallel distributed memory multi-computer systems

Functional decompositionDivide computation into piecesMay good for heterogeneous multiprocessors

Page 4: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Partitioning Checklist

At least 10x more primitive tasks than processors in target computerMinimize redundant computations and redundant data storagePrimitive tasks roughly the same sizeNumber of tasks an increasing function of problem size (scalable partitioning)

Page 5: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

CommunicationDetermine values passed among tasksLocal communication

Task needs values from a small number of other tasksCreate channels illustrating data flow

Global communicationSignificant number of tasks contribute data to perform a computationDon’t create channels for them early in design

Page 6: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication Checklist

Communication operations balanced among tasksEach task communicates with only small group of neighborsTasks can perform communications concurrentlyTask can perform computations concurrently

Page 7: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

AgglomerationGrouping tasks into larger tasksGoals

Eliminate communication between primitive tasks agglomerated into consolidated task Maintain scalability of programCombine groups of sending and receiving tasks

In MPI programming, goal often to create one agglomerated task per processor

Page 8: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Agglomeration ChecklistLocality of parallel algorithm has increasedReplicated computations take less time than communications they replaceData replication doesn’t affect scalabilityAgglomerated tasks have similar computational and communications costsNumber of tasks increases with problem sizeNumber of tasks suitable for likely target systemsTradeoff between agglomeration and code modifications costs is reasonable

Page 9: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

MappingProcess of assigning tasks to processorsMPI and OpenMP programming model

MPI: purely parallel, need parallel algorithm from the start-upOpenMP: Fork-join model, starting-from sequential version and incremental parallelization

Conflicting goals of mappingMaximize processor utilizationMinimize interprocessor communication

NP-hard problem

Page 10: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Mapping Decision TreeStatic number of tasks

Structured communicationConstant computation time per task

• Agglomerate tasks to minimize comm• Create one task per processor

Variable computation time per task• Cyclically map tasks to processors

Unstructured communication• Use a static load balancing algorithm

Dynamic number of tasks

Page 11: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Mapping Strategy

Static number of tasksDynamic number of tasks

Frequent communications between tasksUse a dynamic load balancing algorithm –analyzes the current tasks and produces a new mapping of tasks to processors

Many short-lived tasksUse a run-time task-scheduling algorithm

Page 12: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Task SchedulingCentralized method

One manager and many workersWhen a worker processor has nothing to do, it requests a task from the manager.Sometimes, the manager can be a bottleneck

DistributedEach processor maintains its own list of tasksPush (processors with too many send to some others) and pull strategies

Hybrid method

Page 13: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Mapping Checklist

Considered designs based on one task per processor and multiple tasks per processorEvaluated static and dynamic task allocationIf dynamic task allocation chosen, task allocator is not a bottleneck to performanceIf static task allocation chosen, ratio of tasks to processors is at least 10:1

Page 14: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Case Studies

Boundary value problemFinding the maximumThe n-body problemAdding data input

Page 15: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Partitioning

One data item per grid pointAssociate one primitive task with each grid pointTwo-dimensional domain decomposition

Page 16: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Communication

Identify communication pattern between primitive tasksEach interior primitive task has three incoming and three outgoing channels

Page 17: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Sequential execution time

χ – time to update elementn – number of elementsm – number of iterationsSequential execution time: m (n-1) χ

Page 18: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Parallel Execution Time

p – number of processorsλ – message latencyParallel execution time m(χ⎡(n-1)/p⎤+2λ)

Page 19: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Finding the Maximum ErrorComputed 0.15 0.16 0.16 0.19Correct 0.15 0.16 0.17 0.18Error (%) 0.00% 0.00% 6.25% 5.26%

6.25%

Page 20: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Reduction

Given associative operator ⊕a0 ⊕ a1 ⊕ a2 ⊕ … ⊕ an-1

ExamplesAddMultiplyAnd, OrMaximum, Minimum

Page 21: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Binomial Trees

Subgraph of hypercube

Page 22: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Finding Global Sum

4 2 0 7

-3 5 -6 -3

8 1 2 3

-4 4 6 -1

Page 23: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Finding Global Sum

1 7 -6 4

4 5 8 2

Page 24: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Finding Global Sum

8 -2

9 10

Page 25: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Finding Global Sum

17 8

Page 26: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Finding Global Sum

25

Binomial Tree

Page 27: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Agglomeration

Page 28: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Agglomeration

sum

sum sum

sum

Page 29: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Partitioning

Domain partitioningAssume one task per particleTask has particle’s position, velocity vectorIteration

Get positions of all other particlesCompute new position, velocity

Page 30: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Gather

Page 31: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

All-gather

Page 32: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Scatter

Page 33: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Scatter in log p Steps

12345678 56781234 56 12

7834

Page 34: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Summary: Task/channel Model

Parallel computationSet of tasksInteractions through channels

Good designsMaximize local computationsMinimize communicationsScale up

Page 35: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Summary: Design Steps

Partition computationAgglomerate tasksMap tasks to processorsGoals

Maximize processor utilizationMinimize inter-processor communication

Page 36: Chapter 3 Parallel Algorithm Design Parallel Programming · MPI and OpenMP programming model MPI: purely parallel, need parallel algorithm from the start-up OpenMP: Fork-join model,

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Summary: Fundamental Algorithms

ReductionGather and scatterAll-gather