parallel programming primer

48
Lecture 1 – Parallel Programming Primer CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.http://creativecommons.org/licenses/by/

Upload: sri-prasanna

Post on 10-May-2015

1.493 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Parallel Programming Primer

Lecture 1 – Parallel Programming Primer

CPE 458 – Parallel Programming, Spring 2009

Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.http://creativecommons.org/licenses/by/2.5

Page 2: Parallel Programming Primer

Outline Introduction to Parallel Computing Parallel Architectures Parallel Algorithms Common Parallel Programming Models

Message-PassingShared Address Space

Page 3: Parallel Programming Primer

Introduction to Parallel Computing Moore’s Law

The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months.

Self-fulfilling prophecy Computer architect goal Software developer assumption

Page 4: Parallel Programming Primer

Introduction to Parallel Computing Impediments to Moore’s Law

Theoretical LimitWhat to do with all that die space?

Design complexityHow do you meet the expected performance increase?

Page 5: Parallel Programming Primer

Introduction to Parallel Computing Parallelism

Continue to increase performance via parallelism.

Page 6: Parallel Programming Primer

Introduction to Parallel Computing From a software point-of-view, need to solve demanding problemsEngineering SimulationsScientific ApplicationsCommercial Applications

Need the performance, resource gains afforded by parallelism

Page 7: Parallel Programming Primer

Introduction to Parallel Computing Engineering Simulations

AerodynamicsEngine efficiency

Page 8: Parallel Programming Primer

Introduction to Parallel Computing Scientific Applications

BioinformaticsThermonuclear processesWeather modeling

Page 9: Parallel Programming Primer

Introduction to Parallel Computing Commercial Applications

Financial transaction processingData miningWeb Indexing

Page 10: Parallel Programming Primer

Introduction to Parallel Computing Unfortunately, greatly increases coding complexityCoordinating concurrent tasksParallelizing algorithmsLack of standard environments and support

Page 11: Parallel Programming Primer

Introduction to Parallel Computing The challenge

Provide the abstractions , programming paradigms, and algorithms needed to effectively design, implement, and maintain applications that exploit the parallelism provided by the underlying hardware in order to solve modern problems.

Page 12: Parallel Programming Primer

Parallel Architectures

Standard sequential architecture

CPURAM

BUS

Bottlenecks

Page 13: Parallel Programming Primer

Parallel Architectures

Use multipleDatapathsMemory unitsProcessing units

Page 14: Parallel Programming Primer

Parallel Architectures

SIMD Single instruction stream, multiple data stream

Processing Unit

ControlUnit

Interconnect

Processing Unit

Processing Unit

Processing Unit

Processing Unit

Page 15: Parallel Programming Primer

Parallel Architectures

SIMD Advantages

Performs vector/matrix operations well EX: Intel’s MMX chip

Disadvantages Too dependent on type of computation

EX: Graphics Performance/resource utilization suffers if

computations aren’t “embarrasingly parallel”.

Page 16: Parallel Programming Primer

Parallel Architectures

MIMD Multiple instruction stream, multiple data

streamProcessing/Control

Unit

Processing/Control Unit

Processing/Control Unit

Processing/Control Unit

Interconnect

Page 17: Parallel Programming Primer

Parallel Architectures

MIMD Advantages

Can be built with off-the-shelf components Better suited to irregular data access patterns

Disadvantages Requires more hardware (!sharing control unit) Store program/OS at each processor

Ex: Typical commodity SMP machines we see today.

Page 18: Parallel Programming Primer

Parallel Architectures

Task CommunicationShared address space

Use common memory to exchange data Communication and replication are implicit

Message passing Use send()/receive() primitives to exchange data Communication and replication are explicit

Page 19: Parallel Programming Primer

Parallel Architectures

Shared address spaceUniform memory access (UMA)

Access to a memory location is independent of which processing unit makes the request.

Non-uniform memory access (NUMA) Access to a memory location depends on the

location of the processing unit relative to the memory accessed.

Page 20: Parallel Programming Primer

Parallel Architectures

Message passingEach processing unit has its own private

memoryExchange of messages used to pass dataAPIs

Message Passing Interface (MPI) Parallel Virtual Machine (PVM)

Page 21: Parallel Programming Primer

Parallel Algorithms

Algorithma sequence of finite instructions, often used for calculation and data processing.

Parallel AlgorithmAn algorithm that which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result

Page 22: Parallel Programming Primer

Parallel Algorithms

ChallengesIdentifying work that can be done

concurrently.Mapping work to processing units.Distributing the workManaging access to shared dataSynchronizing various stages of execution.

Page 23: Parallel Programming Primer

Parallel Algorithms

ModelsA way to structure a parallel algorithm by

selecting decomposition and mapping techniques in a manner to minimize interactions.

Page 24: Parallel Programming Primer

Parallel Algorithms

ModelsData-parallelTask graphWork poolMaster-slavePipelineHybrid

Page 25: Parallel Programming Primer

Parallel Algorithms

Data-parallelMapping of Work

Static Tasks -> Processes

Mapping of Data Independent data items assigned to processes

(Data Parallelism)

Page 26: Parallel Programming Primer

Parallel Algorithms

Data-parallelComputation

Tasks process data, synchronize to get new data or exchange results, continue until all data processed

Load Balancing Uniform partitioning of data

Synchronization Minimal or barrier needed at end of a phase

Ex: Ray Tracing

Page 27: Parallel Programming Primer

Parallel Algorithms

Data-parallel

P

P

P

P

P

D D

D D

D

O

O

O

O

O

Page 28: Parallel Programming Primer

Parallel Algorithms

Task graphMapping of Work

Static Tasks are mapped to nodes in a data dependency

task dependency graph (Task parallelism)Mapping of Data

Data moves through graph (Source to Sink)

Page 29: Parallel Programming Primer

Parallel Algorithms

Task graph Computation

Each node processes input from previous node(s) and send output to next node(s) in the graph

Load Balancing Assign more processes to a given task Eliminate graph bottlenecks

Synchronization Node data exchange

Ex: Parallel Quicksort, Divide and Conquer approaches

Page 30: Parallel Programming Primer

Parallel Algorithms

Task graph

P

P P

P

P

P

D

D

D

O

O

Page 31: Parallel Programming Primer

Parallel Algorithms

Work poolMapping of Work/Data

No desired pre-mapping Any task performed by any process

Computation Processes work as data becomes available (or

requests arrive)

Page 32: Parallel Programming Primer

Parallel Algorithms

Work poolLoad Balancing

Dynamic mapping of tasks to processesSynchronization

Adding/removing work from input queueEx: Web Server

Page 33: Parallel Programming Primer

Parallel Algorithms

Work pool

P

Work Pool

PP

PP

Input queue

Output queue

Page 34: Parallel Programming Primer

Parallel Algorithms

Master-slaveModification to Worker Pool Model

One or more Master processes generate and assign work to worker processes

Load Balancing A Master process can better distribute load to

worker processes

Page 35: Parallel Programming Primer

Parallel Algorithms

PipelineMapping of work

Processes are assigned tasks that correspond to stages in the pipeline

StaticMapping of Data

Data processed in FIFO order Stream parallelism

Page 36: Parallel Programming Primer

Parallel Algorithms

PipelineComputation

Data is passed through a succession of processes, each of which will perform some task on it

Load Balancing Insure all stages of the pipeline are balanced

(contain the same amount of work)Synchronization

Producer/Consumer buffers between stagesEx: Processor pipeline, graphics pipeline

Page 37: Parallel Programming Primer

Parallel Algorithms

Pipeline

P

Input queue

Output queue

P P

buffer

buffer

Page 38: Parallel Programming Primer

Common Parallel Programming Models Message-Passing Shared Address Space

Page 39: Parallel Programming Primer

Common Parallel Programming Models Message-Passing

Most widely used for programming parallel computers (clusters of workstations)

Key attributes: Partitioned address space Explicit parallelization

Process interactions Send and receive data

Page 40: Parallel Programming Primer

Common Parallel Programming Models Message-Passing

Communications Sending and receiving messages Primitives

send(buff, size, destination) receive(buff, size, source) Blocking vs non-blocking Buffered vs non-buffered

Message Passing Interface (MPI) Popular message passing library ~125 functions

Page 41: Parallel Programming Primer

Common Parallel Programming Models Message-Passing

Workstation Workstation Workstation Workstation

Data

P4P3P2P1

send(buff1, 1024, p3) receive(buff3, 1024, p1)

Page 42: Parallel Programming Primer

Common Parallel Programming Models Shared Address Space

Mostly used for programming SMP machines (multicore chips)

Key attributes Shared address space

Threads Shmget/shmat UNIX operations

Implicit parallelizationProcess/Thread communication

Memory reads/stores

Page 43: Parallel Programming Primer

Common Parallel Programming Models Shared Address Space

Communication Read/write memory

EX: x++;Posix Thread API

Popular thread API Operations

Creation/deletion of threads Synchronization (mutexes, semaphores) Thread management

Page 44: Parallel Programming Primer

Common Parallel Programming Models Shared Address Space

Workstation

T3 T4T2T1

Data SMPRAM

Page 45: Parallel Programming Primer

Parallel Programming Pitfalls

Synchronization Deadlock Livelock Fairness

Efficiency Maximize parallelism

Reliability Correctness Debugging

Page 46: Parallel Programming Primer

Prelude to MapReduce

MapReduce is a paradigm designed by Google for making a subset (albeit a large one) of distributed problems easier to code

Automates data distribution & result aggregation

Restricts the ways data can interact to eliminate locks (no shared state = no locks!)

Page 47: Parallel Programming Primer

Prelude to MapReduce

Next time… MapReduce parallel programming paradigm

Page 48: Parallel Programming Primer

References

Introduction to Parallel Computing, Grama et al., Pearson Education, 2003

Distributed Systemshttp://code.google.com/edu/parallel/index.html

Distributed Computing Principles and Applications, M.L. Liu, Pearson Education 2004