distributed memory programming with mpi (4) -...

25
1 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected]) Distributed Memory Programming With MPI (4) 2014 Spring Jinkyu Jeong ([email protected])

Upload: dangtram

Post on 16-Apr-2018

247 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

1 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Distributed Memory Programming With MPI (4)

2014 Spring

Jinkyu Jeong ([email protected])

Page 2: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

2 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Roadmap

Hello World in MPI program

Basic APIs of MPI

Example program • The Trapezoidal Rule in MPI.

Collective communication.

MPI derived datatypes.

Performance evaluation of MPI programs.

Parallel sorting.

Safety in MPI programs.

Page 3: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

3 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

A PARALLEL SORTING ALGORITHM

Page 4: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

4 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Sorting

Parallelizing sorting

• n keys and p = comm sz processes.

• n/p keys assigned to each process.

• No restrictions on which keys are assigned to which processes.

• When the algorithm terminates: – The keys assigned to each process should be sorted in (say)

increasing order.

– If 0 ≤ q < r < p, then each key assigned to process q should be less than or equal to every key assigned to process r.

Page 5: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

5 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Serial bubble sort

Bubble sort cannot be efficiently parallelized

• Inner-loop parallelization

– P0 compares and swaps a[0] and a[1]

– P1 compares and swaps a[1] and a[2]

• Outer-loop parallelization

– After each loop, data stored in the array are changed

Page 6: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

6 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Odd-even transposition sort

Even phases, compare swaps:

Odd phases, compare swaps:

This odd-even transposition sort can be parallelized

• P0 compares and swaps a[0] and a[1]

• P1 compares and swaps a[2] and a[3]

Page 7: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

7 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Example: odd-even transposition sort

Start: 5, 9, 4, 3

Even phase: compare-swap (5,9) and (4,3) getting the list 5, 9, 3, 4

Odd phase: compare-swap (9,3) getting the list 5, 3, 9, 4

Even phase: compare-swap (5,3) and (9,4) getting the list 3, 5, 4, 9

Odd phase: compare-swap (5,4) getting the list 3, 4, 5, 9

Page 8: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

8 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Serial odd-even transposition sort

Page 9: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

9 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Parallelizing Odd-even transposition sort

For each phase, processes perform compare and swap of two numbers in parallel

Even phases, compare swaps:

Odd phases, compare swaps:

P0 P1 P1 …

P0 P1 P1 …

Page 10: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

10 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Parallelizing odd-even transposition sort -con’t

Start: 5, 9, 4, 3

Even phase: compare-swap (5,9) and (4,3) getting the list 5, 9, 3, 4

Odd phase: compare-swap (9,3) getting the list 5, 3, 9, 4

Even phase: compare-swap (5,3) and (9,4) getting the list 3, 5, 4, 9

Odd phase: compare-swap (5,4) getting the list 3, 4, 5, 9

P0 P1

Page 11: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

11 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Communications among tasks in odd-even sort For each phase, a process have to know whether two

values to be compared have changed or not

So, a process have to consult its sibling process before comparing two values

Page 12: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

12 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Parallel odd-even transposition sort

Assumption

• Each process performs local sorting for the given numbers

• Then, during the compare-swap phases, numbers in two processes are re-distributed based on the sorting order

Page 13: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

13 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Pseudo-code

Page 14: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

14 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Compute_partner

Page 15: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

15 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Safety in MPI programs

The MPI standard allows MPI_Send to behave in two different ways: • it can simply copy the message into an MPI managed

buffer and return,

• or it can block until the matching call to MPI_Recv starts.

Many implementations of MPI set a threshold at which the system switches from buffering to blocking.

Relatively small messages will be buffered by MPI_Send.

Larger messages, will cause it to block.

Page 16: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

16 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Safety in MPI programs

If the MPI_Send executed by each process blocks, no process will be able to start executing a call to MPI_Recv, and the program will hang or deadlock.

Each process is blocked waiting for an event that will never happen.

(see pseudo-code)

Page 17: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

17 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Safety in MPI programs

A program that relies on MPI provided buffering is said to be unsafe.

Such a program may run without problems for various sets of input, but it may hang or crash with other sets.

Page 18: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

18 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

MPI_Ssend

An alternative to MPI_Send defined by the MPI standard.

The extra “s” stands for synchronous and MPI_Ssend is guaranteed to block until the matching receive starts.

Page 19: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

19 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Restructuring communication

Page 20: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

20 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

MPI_Sendrecv

An alternative to scheduling the communications ourselves.

Carries out a blocking send and a receive in a single call.

The dest and the source can be the same or different.

Especially useful because MPI schedules the communications so that the program won’t hang or crash.

A Rank Buffer

0

B 1

MPI_Sendrecv() A B Rank Buffer

0

B A 1

send_buf recv_buf

Page 21: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

21 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

MPI_Sendrecv

Page 22: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

22 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Restructuring communication using MPI_Sendrecv

Page 23: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

23 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Merging Numbers during Compare & Swap in a Process

Page 24: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

24 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Run-times of parallel odd-even sort

(times are in milliseconds)

Page 25: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start

25 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])

Concluding Remarks

MPI or the Message-Passing Interface • An interface of parallel programming in distributed

memory system

• Supports C, C++, and Fortran

• Many MPI implementations – Ex, MPICH2

SPMD program

Message passing • Communicator

• Point-to-point communication

• Collective communication

• Safe use of communication is important – Ex. MPI_Sendrecv()