introduction to parallel and distributed systems ...kwiatkow/intr/lec1.pdf · - simd - mimd -...

1

Introduction to Parallel and Distributed Systems - INZ0277Wcl – 5 ECTS

Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION

• For questions, email to [email protected] with 'Subject=your name”. Make sure to email from an account I can reply to.

• All course information will be available at https://www.ii.pwr.edu.pl/~kwiatkowski/

1

mailto:[email protected]

2

Grading Policy

The distribution of grades will be as follows: quizes - 10%, final exam - 40%, laboratory projects 25%, homework's 25%. Incremental grades (+/-) and curving are in my discretion. I reserve the rights to fail anyone who has a failing tests average.

To pass the course you need to have at least 40% of points from each activity

- 5.0 for >=90

- else 4.0 for >=75% - else 3.0 for >=60%

- else 2.0.

2

3

TEXTBOOKS

• V. Kumar i inni, "Introduction to Parallel Computing", The Benjamin/Cummings Pub., New York 2003.

• Foster I., “Designing and Building Parallel Programs”,

http://www. mcs.aul.gov/dbpp/text/book.html

• Writing Message-Passing Parallel Programs with MPI, Course Notes, http://www.zib.de/zibdoc/mpikurs/mpi-course.pdf

3

http://www.zib.de/zibdoc/mpikurs/mpi-course.pdf



4

Requirements

• All in and out of class work for grade should be done independently. Projects may be discussed up to design, but no code is allowed to be shared.

• Students are expected to be in class. A penalty of 1% per missed lecture may be imposed.

• No make-up will be given for missed test. Special circumstances will be discussed individually.

• All programming expected to be done on time. A penalty may be imposed.

4

5

A taxonomy of parallel architectures

- control mechanism

- SIMD

- MIMD

- address-space organization

- message-passing architecture

- shared-address-space architecture

- UMA

- NUMA

- interconnection networks

- static

- dynamic

- processor granularity

- coarse-grain computers

- medium-grain computers

- fine-grain computers

6 6

Flynn’s classifications

– Single-instruction-stream, single-data-stream (SISD) computers

- Typical uniprocessors

- Parallelism through pipelines,

– Multiple-instruction-stream, single-data-stream (MISD) computers

- Not used often ?

– Single-instruction-stream, multiple-data-stream (SIMD) computers

- Vector and array processors

– Multiple-instruction-stream, multiple-data-stream (MIMD) computers

- Multiprocessors

7 7

Inter-

connection

Network

P

M

P

M

P

M

Inter-

connection

Network

P

P

P

M

M

M

An uniform-memory-access computer

A non-uniform-memory-access

computer with local memory only

Typical shared-address-space architecture.

8 8

Interconnection Network

P

M

P

M

P

M

P

M

P

M .............

.............

P - Processor

M - Memory

A typical message-passing architecture.

9

Dynamic interconnection networks

Crossbar switching networks

Bus-based networks

Multistage interconnection networks

10

P0

P1

P2

P3

Pp-1

M0 M1 M2 M3 M0

A switching

element P4

A completely nonblocking crossbar switch connecting p processors to b memory banks

11 11

A typical bus-based architecture with no cache.

Global memory

Processor Processor Processor

Bus

12 12

Multistage interconnection network

Stage 1 Stage 2 Stage n

0

1

p-1

0

1

b-1

Multistage interconnection network

Processors Memory banks

13 13

Omega network

Pass-through Cross-over

000 001

010 011

100 101

110 111

000 001

010 011

100 101

110 111

14 14

000 001

010 011

100 101

110 111

000 001

010 011

100 101

110 111

An example of blocking in omega network

15 15

Cost and performance

cost

per

form

ance

number of processors number of processors

crossbar multistage

bus

bus

multistage

crossbar

16

Static interconnection networks

Completely-connected networks

Star-connected network

Linear array

Ring

Mesh (2D, 3D, wraparound)

Hypercube

17

Examples of static interconnection networks.

A two-dimensional mesh A two-dimensional wraparound mesh

A linear array A ring

A completely-connected network A star-connected network

18 18

A three-dimensional mesh

processor

switching

elements

Complete binary tree network and message routing

Examples of static interconnection networks.

19

0-D hypercube

1-D hypercube 2-D hypercube 3-D hypercube

0

1

00

01

10

11

010 000

001 011

100

101

110

111

0000

0001 0011

0100

0101

0110

0111

1000

1001 1011

1100

1101

1110

1111

4-D hypercube

0010 1010

Hypercube

20 20

Evaluating Static Interconnection Networks

Diameter - the maximum distance between any two processors in the

network

Connectivity - measure of the multiplicity of paths between any two

processors

Arc connectivity - minimum number of arc that must be removed from

the network to break it into two disconnected networks.

Bisection width - minimum number of communication links that have to

be removed to partition the network into two equal halves.

Channel width -the number of bits that can be communicated

simultaneously over a link connecting two processors.

Bisection bandwidth - minimum volume of communication allowed

between any two halves of the network with an equal number of processors.

Cost - for example: number of communication links.

21 21

Message passing parallel programming paradigm

several instances of the sequential paradigm are considered together

separate workers or processes

interact by exchanging information

M

P

Memory

Processor

M

P

M

P

M

P

…

communication network

22 22

Message Passing Interface – MPI

extended message-passing model

for parallel computers, clusters and heterogeneous networks

not a language or compiler specification

not a specific implementation or product

support send/receive primitives communicating with other workers

in-order data transfer without data loss

several point-to-point and collective communications

MPI supports the development of parallel libraries

MPI does not make any restrictive assumptions about the underlying hardware architecture

23 23

Message Passing Interface – MPI

MPI is very large (125 functions) - MPI’s extensive functionality requires many functions

MPI is very small and simple (6 functions) - many parallel programs can be written with just 6 basic functions

• MPI_Init()

• MPI_Finalize()

• MPI_Comm_rank()

• MPI_Comm_size()

• MPI_Send()

• MPI_Recv()

24 24

An example

#include “mpi.h”

#include <stdio.h>

int main(int argc, char** argv)

{

int rank, size;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

MPI_Comm_size(MpI_COMM_WORLD, &size);

printf(“Hello, world! I’m %d of %d\n”, rank, size);

MPI_Finalize();

return 0;

}

25 25

Two main functions

Initializing MPI

– every MPI program must call this routine once, before any other MPI routines

MPI_Init(&argc, &argv);

Clean-up of MPI

– every MPI program must call this routine when all communications have completed

MPI_Finalize();

26 26

Communicator

Communicator – MPI processes can only communicate if they share a

communicator

– MPI_COMM_WORLD

- predefined default communicator in MPI_Init()call

27 27

Communicator

How do you identify different processes?

– an MPI process can query a communicator for information about the group

– a communicator returns in rank of the calling process

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

How many processes are contained within a communicator?

– a communicator returns the # of processes in the communicator

MPI_Comm_size(MPI_COMM_WORLD, &size);

28 28

Sending Messages

communication between only 2 processes

source process sends message to destination process

destination process is identified by its rank in the communicator

0

1

5

4

3

2 source

dest

communicator

29 29

MPI types

Message

– an array of elements of a particular MPI datatype

basic MPI datatype

- MPI_(UNSIGNED_)CHAR : signed(unsigned) char

- MPI_(UNSIGNED_)SHORT : signed(unsigned) short int

- MPI_(UNSIGNED_)INT : signed(unsigned) int

- MPI_(UNSIGNED_) LONG : signed(unsigned) long int

- MPI_FLOAT : float

- MPI_DOUBLE : double

MPI datatype

30 30

Send Message

MPI_Send(void* buf, int count, MPI_Datatype datatype, int

dest,

int tag, MPI_COMM_WORLD);

/* (IN) buf : address of the data to be sent */

/* (IN) count : # of elements of the MPI Datatype */

/* (IN) dest : destination process for the message

(rank of the destination process) */

/* (IN) tag : marker distinguishes used message type */

31 31

Receive Message

MPI_Recv(void *buf,

int count,

MPI_Datatype datatype,

int source, /* MPI_ANY_SOURCE */

int tag, /* MPI_ANY_TAG */

MPI_COMM_WORLD,

MPI_Status *status);

/* (IN) buf : address where the data should be placed*/

/* (IN) count : # of elements of the MPI Datatype */

/* (IN) source : rank of the source of the message */

/* (IN) tag : receives a message having this tag*/

/* (OUT) status : some information to be used at later */

introduction to parallel and distributed systems ...kwiatkow/intr/lec1.pdf · - simd - mimd -...

Documents