parallel programming languages andrew rau-chaplin
TRANSCRIPT
Sources D. Skillicorn, D. Talia, “Models and
Languages for Parallel Computation”, ACM Comp. Surveys.
Warning: This is very much ONE practitioners viewpoint! Little attempt has been made to capture the conventional wisdom.
Outline Introduction to parallel programming Example Languages
Message Passing in MPI Data parallel programming in *Lisp Shared address space programming in
OpenMP CILK
Historically Supercomputers
Highly structured numerical programs Parallelization of loops
Multicomputers Each machine had its languages/compilers/libraries
optimized for its architecture Parallel computing for REAL computer scientist,
“Parallel programming is tough, but worth it” Mostly numerical/scientific applications written
using Fortran and parallel numerical libraries Little other parallel software was written!
Needed Parallel programming abstractions
that where Easy – provide help managing
programming complexity But general! Portable – across machines But efficient!
Application Software
System Software
SIMD
Message Passing
Shared Memory
Dataflow
SystolicArrays Generic Parallel
Architecture
Solution: Yet another layer of abstraction!
Parallel Model/Language
Layered Perspective
CAD
Multiprogramming Sharedaddress
Messagepassing
Dataparallel
Database Scientific modeling Parallel applications
Programming models
Communication abstractionUser/system boundary
Compilationor library
Operating systems support
Communication hardware
Physical communication medium
Hardware/software boundary
[Language = Library = Model]
Programming Model Conceptualization of the machine that programmer uses in
coding applications How parts cooperate and coordinate their activities Specifies communication and synchronization operations
Multiprogramming no communication or synch. at program level
Shared address space like bulletin board
Message passing like letters or phone calls, explicit point to point
Data parallel: more regimented, global actions on data Implemented with shared address space or message passing
What does parallelism add? Decomposition
How is the work divided into distinct parallel threads?
Mapping Which thread should be executed on which
processor? Communication
How is non-local data acquired? Synchronization
When must threads know that they have reached a common state?
Skillicorn’s Wish list What properties should a good model of
parallel computation have? Note: desired properties may be
conflicting Themes
What does the programming model handle for the programmer?
How abstract can the model be and still realize efficient programs?
Six Desirable Features
1) Easy to program Should conceal as much detail as possible
Example of 100 proc., each with 5 threads, each thread potential communicated with any other = 5002 possible communication states!
Hide: Decomposition, Mapping, Communications, and Synchronization
As much as possible, rely on translation process to produce exact structure of parallel program
2) Software development methodology
Firm semantic foundation to permit reliable transformation
Issues: Correctness Efficiency Deadlock free
Parallel Model/Language
Parallel Architecture
3) Architecture-Independent Should be able to migrate code easily
to next generation of an architecture Short cycle-times
Should be able to migrate code easily from one architecture to another Need to share code
Even in this space, people are more expensive and harder to maintain than hardware
4) Easy to understand For parallel computing to be main
stream Easy to go from sequential Parallel
Easy to teach Focus on
easy-to-understand tools with clear, if limited, goals
over, complex ones that may be powerful but are hard to use/master!
5) Guaranteed performance Guaranteed performance on a
useful variety of real machines If T(n,p) = c f(n,p) + low order
terms Preserve the Order of the complexity Keep the constants small
A model that is good (not necessarily great) on a range of architectures is attractive!
6) Provide Cost Measures Cost measures are need to drive
algorithmic design choices Estimated execution time Processor utilization Development costs
In sequential, executions times between machines
proportional (Machine A is 5 times faster than Machine B)
Two step model: Optimize algorithmically then code and tune.
6) Provide Cost Measures cont.
In Parallel, Not so simple, no two step model Costs associated with decomposition,
Mapping, Communications, and Synchronization may vary independently!
model must make estimated cost of operations available at design time
Need an accounting scheme or cost model!
Example: How should an algorithm trade-off communication vs. local computation?
Summary: Desired Features Often contradictory Some features more realistic on
some architectures Room for more than one
Language/Model!
Six Classification of Parallel Models
1) Nothing Explicit, Parallelism Implicit2) Parallelism Explicit, Decomposition
Implicit3) Decomposition Explicit, Mapping Implicit4) Mapping Explicit, Communications
Implicit5) Communications Explicit,
Synchronization Implicit6) Everything Explicit
More Abstract,Less Efficient (?)
Less Abstract,More Efficient (?)
Within Each Classification
Dynamic Structure Allows dynamic thread creation Unable to restrict communications May overrun communication capacity
Static Structure No dynamic thread creation May overrun communication capacity, cut Static structure supports cost models for prediction of
communication Static and Communication Limited Structure
No dynamic thread creation Can guarantee performance by limiting frequency and size of
communications
Recent Languages/systems Cilk
http://supertech.csail.mit.edu/cilk/ http://www.cilk.com/ http://software.intel.com/en-us/intel-ci
lk-plus MapReduce
http://labs.google.com/papers/mapreduce.html
http://hadoop.apache.org/
Recent Languages GPUs: OpenCL & CUDA
http://www.khronos.org/opencl/ http://www.nvidia.com/object/cuda_home.html https://developer.nvidia.com/category/zone/
cuda-zone
Grid Programming http://www.globus.org/ http://www.cct.lsu.edu/~gallen/Reports/
GridProgrammingPrimer.pdf