lecture 1 – parallel programming primer
DESCRIPTION
CPE 458 – Parallel Programming, Spring 2009. Lecture 1 – Parallel Programming Primer. Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.http://creativecommons.org/licenses/by/2.5. Outline. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/1.jpg)
Lecture 1 – Parallel Programming Primer
CPE 458 – Parallel Programming, Spring 2009
Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.http://creativecommons.org/licenses/by/2.5
![Page 2: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/2.jpg)
Outline Introduction to Parallel Computing Parallel Architectures Parallel Algorithms Common Parallel Programming Models
Message-PassingShared Address Space
![Page 3: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/3.jpg)
Introduction to Parallel Computing Moore’s Law
The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months.
Self-fulfilling prophecy Computer architect goal Software developer assumption
![Page 4: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/4.jpg)
Introduction to Parallel Computing Impediments to Moore’s Law
Theoretical LimitWhat to do with all that die space?
Design complexityHow do you meet the expected performance increase?
![Page 5: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/5.jpg)
Introduction to Parallel Computing Parallelism
Continue to increase performance via parallelism.
![Page 6: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/6.jpg)
Introduction to Parallel Computing From a software point-of-view, need to solve demanding problemsEngineering SimulationsScientific ApplicationsCommercial Applications
Need the performance, resource gains afforded by parallelism
![Page 7: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/7.jpg)
Introduction to Parallel Computing Engineering Simulations
AerodynamicsEngine efficiency
![Page 8: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/8.jpg)
Introduction to Parallel Computing Scientific Applications
BioinformaticsThermonuclear processesWeather modeling
![Page 9: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/9.jpg)
Introduction to Parallel Computing Commercial Applications
Financial transaction processingData miningWeb Indexing
![Page 10: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/10.jpg)
Introduction to Parallel Computing Unfortunately, greatly increases coding complexityCoordinating concurrent tasksParallelizing algorithmsLack of standard environments and support
![Page 11: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/11.jpg)
Introduction to Parallel Computing The challenge
Provide the abstractions , programming paradigms, and algorithms needed to effectively design, implement, and maintain applications that exploit the parallelism provided by the underlying hardware in order to solve modern problems.
![Page 12: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/12.jpg)
Parallel Architectures
Standard sequential architecture
CPURAM
BUS
Bottlenecks
![Page 13: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/13.jpg)
Parallel Architectures
Use multipleDatapathsMemory unitsProcessing units
![Page 14: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/14.jpg)
Parallel Architectures
SIMD Single instruction stream, multiple data stream
Processing Unit
ControlUnit
Interconnect
Processing Unit
Processing Unit
Processing Unit
Processing Unit
![Page 15: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/15.jpg)
Parallel Architectures
SIMD Advantages
Performs vector/matrix operations well EX: Intel’s MMX chip
Disadvantages Too dependent on type of computation
EX: Graphics Performance/resource utilization suffers if
computations aren’t “embarrasingly parallel”.
![Page 16: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/16.jpg)
Parallel Architectures
MIMD Multiple instruction stream, multiple data
streamProcessing/Control
Unit
Processing/Control Unit
Processing/Control Unit
Processing/Control Unit
Interconnect
![Page 17: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/17.jpg)
Parallel Architectures
MIMD Advantages
Can be built with off-the-shelf components Better suited to irregular data access patterns
Disadvantages Requires more hardware (!sharing control unit) Store program/OS at each processor
Ex: Typical commodity SMP machines we see today.
![Page 18: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/18.jpg)
Parallel Architectures
Task CommunicationShared address space
Use common memory to exchange data Communication and replication are implicit
Message passing Use send()/receive() primitives to exchange data Communication and replication are explicit
![Page 19: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/19.jpg)
Parallel Architectures
Shared address spaceUniform memory access (UMA)
Access to a memory location is independent of which processing unit makes the request.
Non-uniform memory access (NUMA) Access to a memory location depends on the
location of the processing unit relative to the memory accessed.
![Page 20: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/20.jpg)
Parallel Architectures
Message passingEach processing unit has its own private
memoryExchange of messages used to pass dataAPIs
Message Passing Interface (MPI) Parallel Virtual Machine (PVM)
![Page 21: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/21.jpg)
Parallel Algorithms
Algorithma sequence of finite instructions, often used for calculation and data processing.
Parallel AlgorithmAn algorithm that which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result
![Page 22: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/22.jpg)
Parallel Algorithms
ChallengesIdentifying work that can be done
concurrently.Mapping work to processing units.Distributing the workManaging access to shared dataSynchronizing various stages of execution.
![Page 23: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/23.jpg)
Parallel Algorithms
ModelsA way to structure a parallel algorithm by
selecting decomposition and mapping techniques in a manner to minimize interactions.
![Page 24: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/24.jpg)
Parallel Algorithms
ModelsData-parallelTask graphWork poolMaster-slavePipelineHybrid
![Page 25: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/25.jpg)
Parallel Algorithms
Data-parallelMapping of Work
Static Tasks -> Processes
Mapping of Data Independent data items assigned to processes
(Data Parallelism)
![Page 26: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/26.jpg)
Parallel Algorithms
Data-parallelComputation
Tasks process data, synchronize to get new data or exchange results, continue until all data processed
Load Balancing Uniform partitioning of data
Synchronization Minimal or barrier needed at end of a phase
Ex: Ray Tracing
![Page 27: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/27.jpg)
Parallel Algorithms
Data-parallel
P
P
P
P
P
D D
D D
D
O
O
O
O
O
![Page 28: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/28.jpg)
Parallel Algorithms
Task graphMapping of Work
Static Tasks are mapped to nodes in a data dependency
task dependency graph (Task parallelism)Mapping of Data
Data moves through graph (Source to Sink)
![Page 29: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/29.jpg)
Parallel Algorithms
Task graph Computation
Each node processes input from previous node(s) and send output to next node(s) in the graph
Load Balancing Assign more processes to a given task Eliminate graph bottlenecks
Synchronization Node data exchange
Ex: Parallel Quicksort, Divide and Conquer approaches
![Page 30: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/30.jpg)
Parallel Algorithms
Task graph
P
P P
P
P
P
D
D
D
O
O
![Page 31: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/31.jpg)
Parallel Algorithms
Work poolMapping of Work/Data
No desired pre-mapping Any task performed by any process
Computation Processes work as data becomes available (or
requests arrive)
![Page 32: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/32.jpg)
Parallel Algorithms
Work poolLoad Balancing
Dynamic mapping of tasks to processesSynchronization
Adding/removing work from input queueEx: Web Server
![Page 33: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/33.jpg)
Parallel Algorithms
Work pool
P
Work Pool
PP
PP
Input queue
Output queue
![Page 34: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/34.jpg)
Parallel Algorithms
Master-slaveModification to Worker Pool Model
One or more Master processes generate and assign work to worker processes
Load Balancing A Master process can better distribute load to
worker processes
![Page 35: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/35.jpg)
Parallel Algorithms
PipelineMapping of work
Processes are assigned tasks that correspond to stages in the pipeline
StaticMapping of Data
Data processed in FIFO order Stream parallelism
![Page 36: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/36.jpg)
Parallel Algorithms
PipelineComputation
Data is passed through a succession of processes, each of which will perform some task on it
Load Balancing Insure all stages of the pipeline are balanced
(contain the same amount of work)Synchronization
Producer/Consumer buffers between stagesEx: Processor pipeline, graphics pipeline
![Page 37: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/37.jpg)
Parallel Algorithms
Pipeline
P
Input queue
Output queue
P P
buffer
buffer
![Page 38: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/38.jpg)
Common Parallel Programming Models Message-Passing Shared Address Space
![Page 39: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/39.jpg)
Common Parallel Programming Models Message-Passing
Most widely used for programming parallel computers (clusters of workstations)
Key attributes: Partitioned address space Explicit parallelization
Process interactions Send and receive data
![Page 40: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/40.jpg)
Common Parallel Programming Models Message-Passing
Communications Sending and receiving messages Primitives
send(buff, size, destination) receive(buff, size, source) Blocking vs non-blocking Buffered vs non-buffered
Message Passing Interface (MPI) Popular message passing library ~125 functions
![Page 41: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/41.jpg)
Common Parallel Programming Models Message-Passing
Workstation Workstation Workstation Workstation
Data
P4P3P2P1
send(buff1, 1024, p3) receive(buff3, 1024, p1)
![Page 42: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/42.jpg)
Common Parallel Programming Models Shared Address Space
Mostly used for programming SMP machines (multicore chips)
Key attributes Shared address space
Threads Shmget/shmat UNIX operations
Implicit parallelizationProcess/Thread communication
Memory reads/stores
![Page 43: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/43.jpg)
Common Parallel Programming Models Shared Address Space
Communication Read/write memory
EX: x++;Posix Thread API
Popular thread API Operations
Creation/deletion of threads Synchronization (mutexes, semaphores) Thread management
![Page 44: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/44.jpg)
Common Parallel Programming Models Shared Address Space
Workstation
T3 T4T2T1
Data SMPRAM
![Page 45: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/45.jpg)
Parallel Programming Pitfalls
Synchronization Deadlock Livelock Fairness
Efficiency Maximize parallelism
Reliability Correctness Debugging
![Page 46: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/46.jpg)
Prelude to MapReduce
MapReduce is a paradigm designed by Google for making a subset (albeit a large one) of distributed problems easier to code
Automates data distribution & result aggregation
Restricts the ways data can interact to eliminate locks (no shared state = no locks!)
![Page 47: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/47.jpg)
Prelude to MapReduce
Next time… MapReduce parallel programming paradigm
![Page 48: Lecture 1 – Parallel Programming Primer](https://reader035.vdocuments.mx/reader035/viewer/2022062720/568134ca550346895d9bf49a/html5/thumbnails/48.jpg)
References
Introduction to Parallel Computing, Grama et al., Pearson Education, 2003
Distributed Systemshttp://code.google.com/edu/parallel/index.html
Distributed Computing Principles and Applications, M.L. Liu, Pearson Education 2004