parallel computing(1)

38
Introduction to Parallel Computing Part I b

Upload: md-mahedi-mahfuj

Post on 11-May-2015

525 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Parallel computing(1)

Introduction to Parallel Computing

Part Ib

Page 2: Parallel computing(1)

Processor Intercommunication

In part 1b we will look at the interconnection

network between processors. Using these

connections, various communication patterns

can be used to transport information from

one or more source processors, to one or more

destination processors.

Page 3: Parallel computing(1)

Processor Topologies (1)

There are several ways in which processors

can be interconnected. The most important

include

• Bus• Star• Tree• Fully connected

• Ring• Mesh• Wraparound mesh• Hypercube

Page 4: Parallel computing(1)

Topology Issues

Before looking at some of the major processor

topologies we have to know what makes a

certain topology well or ill suited to connect

processors.

There are two aspects of topologies that

should be looked at. These are scalability and

cost (communication and hardware).

Page 5: Parallel computing(1)

Terminology

Communication is one-to-one when one

processor sends a message to another one.

In one-to-all or broadcast, one processor sends

a message to all other processors. In all-to-one

communication, all processors send their

message to one processor. Other forms of

communication include gather and all-to-all.

Page 6: Parallel computing(1)

Connection Properties

There are three elements that are used in

Building the interconnection network. These

are the wire, a relay node, and a processor

node. The latter two elements can at a given

time send or receive only one message, no

matter how many wires enter or leave the

element.

Page 7: Parallel computing(1)

Bus Topology (1)

P2 P4 P…

P1 P3 P5

Page 8: Parallel computing(1)

Bus Topology (2)

Hardware cost 1

One-to-one 1

One-to-all 1

All-to-all p

Problems Bus becomes bottleneck

Page 9: Parallel computing(1)

Star Topology (1)

P1

P3

P5P0

P2 P4

P8 P6

P7

Page 10: Parallel computing(1)

Star Topology (2)

Hardware cost p – 1

One-to-one 1 or 2

One-to-all p – 1

All-to-all 2 · (p – 1)

Problems Central processor becomes

bottleneck.

Page 11: Parallel computing(1)

Tree Topology (1)

P1 P3 P5P2 P4 P8P6 P7

Page 12: Parallel computing(1)

Tree Topology (2)

Hardware cost 2p – 2, when p power of 2One-to-one 2 · 2log pOne-to-all (2log p) · (1 + 2log p)All-to-all 2 · (2log p) · (1 + 2log p)

Problems Top node becomes bottleneckThis can be solved by addingmore wires at the top (fat tree)

Page 13: Parallel computing(1)

Tree Topology – One-to-all

P1 P3 P5P2 P4 P8P6 P7P1 P5P3 P7P2 P4 P8P6

Page 14: Parallel computing(1)

Fully Connected Topology (1)

P1

P3

P5

P2

P4

P6

Page 15: Parallel computing(1)

Fully Connected Topology (2)

Hardware cost p·(p – 1)/2

One-to-one 1

One-to-all 2log pAll-to-all 2·2log p

Problems Hardware cost increases

quadratically with respect to p

Page 16: Parallel computing(1)

Ring Topology (1)

P1

P3

P5

P2

P4

P6

Page 17: Parallel computing(1)

Ring Topology (2)

Hardware cost pOne-to-one p / 2One-to-all p / 2All-to-all 2·p / 2

Problems Processors are loaded withtransport jobs. But hardwareand communication cost

low

Page 18: Parallel computing(1)

2D-Mesh Topology (1)

P1 P3

P5

P2P0

P6 P7P4

P9 P11

P13

P10P8

P14 P15P12

Page 19: Parallel computing(1)

2D-Mesh Topology (2)

Hardware cost 2·(p)·(p – 1)One-to-one 2(p – 1)One-to-all 2(p – 1)All-to-all 2·p / 2

Remarks Scalable both in network andcommunication cost. Can befound in many architectures.

Page 20: Parallel computing(1)

Mesh – All-to-all

P1 P2 P4P3 P5

1 2 3 4 5

Step 0Step 1

P1 P2 P4P3 P5

1 2 3 4 5

Step 1

P1 P2 P4P3 P5

1 1 2 3 4 5 5

Step 2

P1 P2 P4P3 P5

1 1 2 3 4 5 5

Step 2

P1 P2 P4P3 P5

1 1 2 3 4 5 51 2

Step 3

P1 P2 P4P3 P5

1 1 2 3 4 5 51 2

Step 3

P1 P2 P4P3 P5

1 1 2 3 4 5 51 2 4 5

Step 4

P1 P2 P4P3 P5

1 1 2 3 4 5 51 2 4 5

Step 4

P1 P2 P4P3 P5

1 1 2 3 4 5 51 2 4 53 4 5

Step 5

P1 P2 P4P3 P5

1 1 2 3 4 5 51 2 4 53 4 5

Step 5

P1 P2 P4P3 P5

1 1 2 3 4 5 51 2 4 53 4 52 3 4 5 1 2 3

Step 6

P1 P2 P4P3 P5

1 1 2 3 4 5 51 2 4 53 4 52 3 4 5 1 2 3

Step 6

P1 P2 P4P3 P5

1 1 2 3 4 5 51 2 4 53 4 52 3 4 5 1 2 3 1 2 3 4

Page 21: Parallel computing(1)

2D Wrap-around Mesh (1)

P1 P3

P5

P2P0

P6 P7P4

P9 P11

P13

P10P8

P14 P15P12

Page 22: Parallel computing(1)

2D Wrap-around Mesh (2)

Hardware cost 2pOne-to-one 2·p / 2One-to-all 2· p / 2All-to-all 4· p / 2

Remarks Scalable both in network andcommunication cost. Can befound in many architectures.

Page 23: Parallel computing(1)

2D Wrap-around – One-to-all

P1 P3

P5

P2P0

P6 P7P4

P9 P11

P13

P10P8

P14 P15P12

P0 P1 P3P2

P5 P6 P7P4

P9

P13

P10P8

P14 P15P12

P11

Page 24: Parallel computing(1)

Hypercube Topology (1)

4D

3D

2D

1D

Page 25: Parallel computing(1)

Hypercube Construction

1DHypercube

2DHypercube

3DHypercube

4DHypercube

Page 26: Parallel computing(1)

Hypercube Topology (2)

Hardware cost (p / 2) · 2log p

One-to-one 2log p

One-to-all 2log p

All-to-all 2 · 2log p

Remarks The most elegant design, also

when it comes down to routing

algorithms. But difficult to build

in hardware.

Page 27: Parallel computing(1)

4D Hypercube – One-to-all

Page 28: Parallel computing(1)

Communication Issues

There are some things left to be said about

communication :

• In general the time required for data transmission is of the form startup-time + transfer speed * package size. So it is more efficient to sent one large package instead of many small packages (startup-time can be high, e.g. think about internet)

• Asynchronous vs. Synchronous transfer and deadlocks.

Page 29: Parallel computing(1)

Asynchronous vs. Synchronous

The major difference between asynchronous

and synchronous communication is that the

first methods sends a message and continues,

while the second sends a message and waits

for the receiver program to receive the

message.

Page 30: Parallel computing(1)

Example asynchronous comm.

Processor A Processor B

Send to B

Instruction A2

Instruction A3

Instruction A4

Instruction A5

Instruction B2

Instruction B3

Instruction B4

Receive from A

Instruction B5

Instruction A1 Instruction B1Instruction A1

Instruction A2

Instruction A3

Instruction A4

Instruction A5

Instruction B1

Instruction B3

Instruction B4

Receive from A

Instruction B5

Send to B Instruction B2

Instruction A1

Send to B

Instruction A3

Instruction A4

Instruction A5

Instruction B1

Instruction B2

Instruction B4

Receive from A

Instruction B5

Instruction A2 Instruction B3

Instruction A1

Send to B

Instruction A2

Instruction A4

Instruction A5

Instruction B1

Instruction B2

Instruction B3

Receive from A

Instruction B5

Instruction A3 Instruction B4

Instruction A1

Send to B

Instruction A2

Instruction A3

Instruction A5

Instruction B1

Instruction B2

Instruction B3

Instruction B4

Instruction B5

Instruction A4 Receive from A

Instruction A1

Send to B

Instruction A2

Instruction A3

Instruction A4

Instruction B1

Instruction B2

Instruction B3

Instruction B4

Receive from A

Instruction A5 Instruction B5

Page 31: Parallel computing(1)

Asynchronous Comm.

• Convenient because processors do not have to wait for each other.

• However, we often need to know whether or not the destination processors has received the data, this often requires some checking code later in the program.

• Need to know whether the OS supports reliable communication layers.

• Receive instruction may or may not be blocking.

Page 32: Parallel computing(1)

Example synchronous comm.

Processor A Processor B

Send to B

Instruction A2

Instruction A3

Instruction A4

Instruction A5

Instruction B2

Instruction B3

Instruction B4

Receive from A

Instruction B5

Instruction A1 Instruction B1Instruction A1

Instruction A2

Instruction A3

Instruction A4

Instruction A5

Instruction B1

Instruction B3

Instruction B4

Receive from A

Instruction B5

Send to B Instruction B2

Instruction B1

Instruction B2

Instruction B4

Receive from A

Instruction B5

Instruction B3

Instruction B1

Instruction B2

Instruction B3

Receive from A

Instruction B5

Instruction B4

Instruction B1

Instruction B2

Instruction B3

Instruction B4

Instruction B5

Receive from A

Instruction A1

Send to B

Instruction A3

Instruction A4

Instruction A5

Instruction B1

Instruction B2

Instruction B3

Instruction B4

Receive from A

Instruction A2

Instruction B5

Page 33: Parallel computing(1)

Synchronous comm.

• Both send and receive are blocking

• Processors have to wait for each other. This reduces efficiency.

• Implicitly offers a synchronisation point.

• Easy to program because fewer unexpected situations can arise.

• Problem: Deadlocks may occur.

Page 34: Parallel computing(1)

Deadlocks (1)

A deadlock is a situation where two or more

processors are waiting to for each other

infinitely.

Page 35: Parallel computing(1)

Deadlocks (2)

Processor A Processor B

Send to B

Receive from B

Send to A

Receive from A

Note: Only occurs with synchronous communication.

Page 36: Parallel computing(1)

Deadlocks (3)

Processor A Processor B

Send to B

Receive from B

Receive from A

Send to A

Page 37: Parallel computing(1)

Deadlocks (4)

P1

P3

P5

P2

P4

P6

Pattern: P1 P2 P5 P4 P6 P3 P1

Page 38: Parallel computing(1)

End of Part I

Are there any questions regarding part I