parallel programming - univ-brest.frlabsticc.univ-brest.fr/~lemarch/eng/cours/algop1pareng.pdf ·...
Post on 28-May-2020
24 Views
Preview:
TRANSCRIPT
1
ParallelProgramming
Laurent LemarchandLaurent LemarchandLISyC/UBOLISyC/UBO
Laurent.Lemarchand@univ-brest.frLaurent.Lemarchand@univ-brest.fr
2
What is parallelism Parallele computers architecture Distributed and parallel applications Message passing model PVM : fonctionnalities Using PVM
Combinatorial optimization techniques Exact methods Heuristic methods Parallelization
Course outline
3
What is parallelism
Execute an algorithm with a set of processor instead of a single one
Divide the algorithm into a set of tasks that can be run independently of disjoint processors
Goal : decrease the runtimes needed to solve the problem 3 abstraction levels
Architectures Algorithms Programming model
4
Parallel Architectures
Parallel computer : more than one processors, support for parallel programming
Different architecture types Distributed Centralized
Architecture models : classification of parallel computers SIMD MIMD
5
Parallel Algorithm
Approachs for solving problems with parallel support
Algorithmic Models : simplified models for the execution of programs – facilitate program design
Allows theorical performance analysis
6
Parallel programming
Programming language which allow the expression of parallel execution
Different abstraction levels Automatic parallelism from sequential programs : the dream,
but not efficient in general now Programming style depends (partially) of targeted
architecture
7
Parallelism sources
Examples Control
command
Movment detection
Matrix
manipulation
Scheme
Définition
control data pipeline
« many actions atthe same time »
« same action onsimilar data »
« workflow »
8
Some vocabulary tips
Parallel vs. Distributed computing Parallel : homogenous, high level of dependency among
tasks Distributed : heterogenous, physical and logical
indendency among tasks, client-server
Parallel vs. Concurrent computing Parallel : prossess cooperate on solving a problem Concurrent : prosses compete for the resources
9
Large grain (task level) Program
Average grain (control level) Function
Fine grain (data level) Loop
Very fine grain (many levels) Hardware
Task i-l Task i Task i+l
func1 ( ){........}
func2 ( ){........}
func3 ( ){........}
a (0) = ..b (0) = ..
a (1) = ..b (1) = ..
a (2) = ..b (2) = ..
+ x Load
Parallelism grain
10
Parallelismarchitectural grain
Vector, pipeline, .... Intra processor (computing units)
SMP Multi-processor (process/threads)
Cluster Many processors, dedicated network (giganet,
myrinet) Constellation
Cluster grid Grid
Many processors, public network (ethernet) Meta computing
On demand computation time
11
ArchitecturesFlynn Classification
MIMDMISD (Systolique)
Multiple
SIMD(Vectoriel)
SISD (Von Neumann)
UniqueInstruction flow
MultipleUnique
Data flow
SPMD
12
Architecturesmemory classes
Flynn don't speak about memory classes shared
Simple model bottleneck
distributed scalable hard programming aspects
P P P P
P P P P
mémoire
13
Clusters vs. Network of workstations
Networks of workstations (NOW) : Set of heterogenous computers Geographic dispersion Non privative use Parallel computing = optional usage of computers Standard network connection as ethernet
Cluster Set of homogenous computers Single location Dedicated computing nodes One parallel computing resource High performannce network (Fast/Gigabit Ethernet, Myrinet)
14
Parallel computer size
«Small» parallel computers 2 < #nodes < 64 Typically, Multiprocessors (SMP) Global memory, cache consistency mechanism
«Large» parallel computers 64 < #nodes < few hundred or thousand Typically Multi-computer Often SMP clusters
15
Parallel computer size
Site Computer Pays An OS Arch1DOE/NNSA/LANL IBM US 2008 1026000 122400Linux2DOE/NNSA/LLNL IBM US 2007 478200 212992CNK/SLES 9MPP3Argonne IBM US 2007 450300 163840CNK/SLES 9MPP4 Sun US 2008 326000 62976Linux5 US 2008 205000 30976CNL MPP6FZJ IBM 2007 180000 65536CNK/SLES 9MPP7NMCAC SGI US 2007 133200 14336 MPP8TATA SONS HP Inde 2008 132800 14384Linux9IDRIS IBM France 2008 112500 40960CNK/SLES 9MPP
10Total Exploration ProductionSGI France 2008 106100 10240 MPP
Fab. RMax #ProcsBladeCenter QS22/LS21 Cluster, PowerXCell 8i 3.2 Ghz / Opteron DC 1.8 GHz , Voltaire InfinibandClustereServer Blue Gene SolutionBlue Gene/P Solution
U. of Texas SunBlade x6420, Opteron Quad 2Ghz, Infiniband ClusterDOE/Oak Ridge Cray Cray XT4 QuadCore 2.1 GHz
Blue Gene/P Solution AllSGI Altix ICE 8200, Xeon quad core 3.0 GHz SLES10 + SGI ProPack 5Cluster Platform 3000 BL460c, Xeon 53xx 3GHz, Infiniband ClusterBlue Gene/P SolutionSGI Altix ICE 8200EX, Xeon quad core 3.0 GHz SLES10 + SGI ProPack 5
http://www.top500.org/lists/2008/06Rmax - Maximal LINPACK performance achieved (TFlops)
16
Parallel computer applications
Main fields Science
Plasma physics, quantic mechanics Molecular chimical
Ingeneering Telephone, networks Microelectronics design Mechanical design Air control, control command
Forecasting Wheater, Earthbreaks Social & economical models
Exploration Oceanic exploration, mineral exploration, oil Satellite pictures
17
Parallel computer applications
Main fields Military
Multihead rockets Radar and Sonar tracing and analysiis Mapping, strategic data warehouses
Clinical Scan, medical imaging Protein synthesis Génetics
AI Robotiqcs, autonomous vehicles, Vision, Planning Speach recognition
Visualization 3D (cinema, video games) Pattern recognition
18
ApplicationsMIMD/SPMD
On grid or cluster Distributed memory
Cheap, widely avalaible client/server or p2p
Functional or programmming level SPMD (1 program) client/server (MIMD 2 programs)
19
Grid ApplicationsInstallation models
Internet
Zone of congestion
Zone de congestion
Client-server Centralized or distributed
memory Caching to avoid congestion Centralized information
Peer to peer (P2P) Each peer is both client and
server Work balancing Distributed information
Internet
srv
c
srv
Zone of congestion
c
c
c
cc
cc
c
c
c/s
srv srv
c/s
c/s
c/sc/s
c/s
c/s
c/s
c/s
20
Grid Applicationsclient/server
Information grid (cloud)
Data staorage and diffusion
Web, nfs, ....
Computing grid
Exploit available computing power Meta computing : book time on
supercomputers Internet computing (SETI, Decrypton, ...)
Web server
Search engine
Web server
21
ApplicationsP2P grids
Information grid
distributed
Napster, Gnutella, Freenet
Computing grid Migrate client/server applications to p2p
CGP2P
22
Introducing message passing
Message passing model : Set of parallel process Process running onto separate
processors Each processor ows its memory
(not shared) Process communications
depends on messages (send and receive)
P2
M
P4
M
P1
M
P3
M
E
E
E
E
R
R R
R
23
Introducing message passing
Main functions offered by messages passing Data exchange Synchronisation
Suitable for distributed memory architectures (de type MIMD) Multi-Computers Clusters and NOW
24
Model characteristics
Drawbacks : Non easy programming Explicit management of
Data distribution Process scheduling Process commmunication
Consequences Eventually long development cycle Errors High development costs
25
Model characteristics
Advantages : Efficiency Programmer has lot degrees of freedom Optimization and « fine-tuning » of application Consequence : best usage of ressources for maximal
efficiency is possible Portability
Long term model The model is well known and popular since a long time :
standard programming Many environment implentation ever realized
Consequences Easily portable But code portability <> performance portability
26
Designing applications based on the message passing model
2 main tools : Message Passing Interface (MPI) Parallel Virtual Machine (PVM)
Most of message passing applications deployed now are based on one of the above libraries
Extension of existing programming languages C C++ Fortran
27
Programming model leakly coupled network
asynchroneous MPMD model (SPMD)
Programming model Services ? Libray uuusage
Parallel Virtual Machine
Heterogeneous network (clusters/grid/constellations)
C or Fortran host language services
28
Programming model communicating process
asynchroneous communication process
Model is independant from resources Process localization Commuunication hardware
Communications Bi-point Or group
P1
P3 P4
P2
29
Parallel Virtual Machine caracteristics
On leakly connnected heterogeneous network Services
Creation/Destruction of processs Communications (XDR transport)
Asynchroneous bi-point FIFO or multicast Synchroneous bi-point (RdV) or group
(synchronisation barrier) Distant signals
Machine and process management
30
References PVM
Web site www.epm.ornl.gov/pvm/pvm_home.html
BookPVM: Parallel Virtual MachineA Users' Guide and Tutorial for Networked Parallel Computing
Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek, Vaidy Sunderam
www.netlib.org/pvm3/book/pvm-book.html
31
Parallel Virtual Machine virtual computer
Unified view of hardware
Various computing resources Various communication resources
Virtual machine
c1 c2 c3 c1 c4 c5a1 a2 a3
p1 p2p3 p4 p5r2r1
applications
resources
32
Parallel Virtual Machine virtual computer
An application
A few componants (processs) interacting together freely
Simultaneous applications
A virtual machine
Preliminary enrollment of hardware resources
Possible dynamic management
33
Parallel Virtual Machinecomposition
For applications Ressource acces by an API
For hardware
Enrollment within PVM machine
Daemon local
pvmdp1
pvmd
c1
c2
pvmd
network
p2
34
Parallel Virtual Machinesetup
PVM console Machine management
Process management
% pvmpvm> conflocalhostpvm> add distHost
% pvmpvm> ps -ef...pvm> resetpvm> haltpvm> quit
35
Parallel Virtual Machineconsole
% pvmpvm> help add Add hosts to virtual machine alias Define/list command aliases conf List virtual machine configuration delete Delete hosts from virtual machine echo Echo arguments export Add environment variables to spawn export list getopt Display PVM options for the console task halt Stop pvmds help Print helpful information about a command id Print console task id jobs Display list of running jobs kill Terminate tasks mstat Show status of hosts names List message mailbox names ps List tasks pstat Show status of tasks put Add entry to message mailbox quit Exit console reset Kill all tasks, delete leftover mboxes setenv Display or set environment variables setopt Set PVM options - for the console task *only*! sig Send signal to task spawn Spawn task trace Set/display trace event mask unalias Undefine command alias unexport Remove environment variables from spawn export list version Show libpvm version
36
PVM process
Choice of location by machine/by archi/dontcare
Uniq tid similar to Unix PID
/home/master10214
/home/slave7418
/home/slave657
/home/slave7419
NumPs = pvm_spawn(“/home/slave”, ..., 3, tabTids)// tabTids : {7419,7418,657}
37
PVM processs
Basis of applications
Identified by a uniq tid over the PVM PVM
Correspond to Unix process execution
/home/myAppli10214
mtid = pvm_mytid();// mtid : 10214
prt = pvm_parent();// prt : father tid
38
PVM
PVM communications
Communications by message passing
message
Communication biPoint FIFO
Communications buffered by PVM
/home/p210214/home/p1
7419
m2 m1 m2 m1
39
PVM message composition
Who is destinated ? tid
Which kind (tag) of message ? Possible filter at destination
Data ?
typing (heterogeneous computers) Composite message
/home/p1 10214
/home/p210214
10214 circle
10214 square
2/50/10 'hello' 20.658/1.008
40
PVM sending – initialisation
myAppli
...pvm_initsend(PvmDataDefault);
libpvm
Initialize the communication buffer
Data encoding free, XDR
According to the composition of the PVM
41
PVM sending – message assembly
myApp
...pvm_initsend(PvmDataDefault);pvm_pkint(tab, 10, 1);pvm_pkstr('toto');
libpvm
1,2,...toto
Copy data into buffer
1,2, ...
Data type associated t function names pkint, pkfloat, pkstring, ... Many consecutive packing possibility
tototab
42
PVM sending – effective sending
myApp
...pvm_initsend(PvmDataDefault);pvm_pkint(tab, 10, 1);pvm_pkstr('toto');pvm_send(10214, 45);
libpvm
1,2,...toto
Message sent with a kind tag to recipient
1,2, ...
Copy into daemon memory space Non blocking sending
toto
PVM
10214 45 1,2,...toto
10214
tab
43
PVM Bocking reception - filtering
myApp
...pvm_recv(exp, gre);
libpvm
Filtering On sender exp On the tag associated to the message Possible don't care (-1) :
PVM
7419 45 1,2,...toto
pvm_recv(-1, gre); pvm_recv(exp, -1);
44
PVM non blocking reception - test
...pvm_nrecv(exp, gre);
...pvm_probe(exp, gre);
Return a value < 0 if error
45
PVM reception - unpacking
myApp
...pvm_recv(exp,gre);pvm_upkint(truc, 10, 1);pvm_upkstr(ptr);
libpvm
1,2,...toto
Copy data locally
1,2, ...
Same order in during sending operation
totoptr truc
46
PVM process groups
...pvm_join(''group'');
...pvm_leave(''groupe'');
Return a <0 value if error For global synchronization/communications Implanted using available hardware
47
PVM direct table sending
pvm_psend(int tid, int msgtag, void *vp, int cnt, int type )
pvm_precv(int tid, int msgtag, void *vp, int cnt, int type, int *rtid, int *rtag, int *rcnt)
vp, cnt : data, type: table types Receiving mode: returns same informations as pvm_bufinfo()
: actual tid, effective tag and message size
PVM_STR PVM_FLOAT PVM_BYTE PVM_CPLX PVM_SHORT PVM_DOUBLE PVM_INT PVM_DCPLX PVM_LONG PVM_UINT PVM_USHORT PVM_ULONG
48
PVM group communications
pvm_initsend(...);pvm_pk...(...);pvm_bcast('group', genre);
...pvm_barrier('group', n);
synchronisation Everybody is blocked until callneme
message to multiple recipients at the same time Message typing as with pvm_send()
49
Master/Slave principle
Master controls process and global data
PROGRAMIF (process = master) THEN
master-codeELSE
slave-codeENDIFEND
50
PVM example - application
master.c
Create n process with 2 distinct programs
First, master, which will create other process
n-1 slaves, same processing, but on distinct data
slave.c
/home/master
/home/slave
/home/slave
/home/slave
... ...
51
PVM example - execution
A PVM with 2 computers
Running the app and then ending the PVM
[on comp1]% pvmpvm> add comp2pvm> quit[on comp1]%
[on comp1]% master...trace...[on comp1]% pvmpvm> halt[on comp1]%
52
PVM example – master code
#include "pvm3.h"#define SLAVENAME "/home/slave"
main(){ int tids[32]; /* slave task ids */ int n, nproc, numt, i, who, msgtype, nhost, narch; float data[100], result[32];
puts("How many slave programs (1-32)?"); scanf("%d", &nproc); /* start up slave tasks */ numt=pvm_spawn(SLAVENAME, (char**)0, 0, "", nproc, tids);
/* Begin User Program -- dummy data */ n = 100; for( i=0 ; i<n ; i++ ) { data[i] = i*10.8; }
/* Broadcast initial data to slave tasks */ pvm_initsend(PvmDataDefault); pvm_pkint(&nproc, 1, 1); pvm_pkint(tids, nproc, 1); pvm_pkint(&n, 1, 1); pvm_pkfloat(data, n, 1); pvm_mcast(tids, nproc, 0);
/* Wait for results from slaves */ msgtype = 5; for( i=0 ; i<nproc ; i++ ) {
pvm_recv( -1, msgtype );pvm_upkint( &who, 1, 1 );pvm_upkfloat( &result[who], 1, 1 );printf("I got %f from %d\n", result[who], who);
}
/* Program Finished exit PVM before stopping */ pvm_exit();}
53
PVM example – slave code
#include <stdio.h>#include "pvm3.h"
main(){ int mytid; /* my task id */ int tids[32]; /* task ids */ int n, me, i, nproc, master, msgtype; float data[100], result;
/* enroll in pvm */ mytid = pvm_mytid(); /* Receive data from master */ msgtype = 0; pvm_recv( -1, msgtype ); pvm_upkint(&nproc, 1, 1); pvm_upkint(tids, nproc, 1); pvm_upkint(&n, 1, 1); pvm_upkfloat(data, n, 1);
/* Determine which slave I am (0 -- nproc-1) */ for( i=0; i<nproc ; i++ ) if( mytid == tids[i] ){ me = i; break; }
/* Do calculations with data */ result = work( ... );
/* Send result to master */ pvm_initsend( PvmDataDefault ); pvm_pkint( &me, 1, 1 ); pvm_pkfloat( &result, 1, 1 ); msgtype = 5; master = pvm_parent(); pvm_send( master, msgtype );
/* Program finished. Exit PVM before stopping */ pvm_exit();}
top related