parallel computing(1)
TRANSCRIPT
Introduction to Parallel Computing
Part Ib
Processor Intercommunication
In part 1b we will look at the interconnection
network between processors. Using these
connections, various communication patterns
can be used to transport information from
one or more source processors, to one or more
destination processors.
Processor Topologies (1)
There are several ways in which processors
can be interconnected. The most important
include
• Bus• Star• Tree• Fully connected
• Ring• Mesh• Wraparound mesh• Hypercube
Topology Issues
Before looking at some of the major processor
topologies we have to know what makes a
certain topology well or ill suited to connect
processors.
There are two aspects of topologies that
should be looked at. These are scalability and
cost (communication and hardware).
Terminology
Communication is one-to-one when one
processor sends a message to another one.
In one-to-all or broadcast, one processor sends
a message to all other processors. In all-to-one
communication, all processors send their
message to one processor. Other forms of
communication include gather and all-to-all.
Connection Properties
There are three elements that are used in
Building the interconnection network. These
are the wire, a relay node, and a processor
node. The latter two elements can at a given
time send or receive only one message, no
matter how many wires enter or leave the
element.
Bus Topology (1)
P2 P4 P…
P1 P3 P5
Bus Topology (2)
Hardware cost 1
One-to-one 1
One-to-all 1
All-to-all p
Problems Bus becomes bottleneck
Star Topology (1)
P1
P3
P5P0
P2 P4
P8 P6
P7
Star Topology (2)
Hardware cost p – 1
One-to-one 1 or 2
One-to-all p – 1
All-to-all 2 · (p – 1)
Problems Central processor becomes
bottleneck.
Tree Topology (1)
P1 P3 P5P2 P4 P8P6 P7
Tree Topology (2)
Hardware cost 2p – 2, when p power of 2One-to-one 2 · 2log pOne-to-all (2log p) · (1 + 2log p)All-to-all 2 · (2log p) · (1 + 2log p)
Problems Top node becomes bottleneckThis can be solved by addingmore wires at the top (fat tree)
Tree Topology – One-to-all
P1 P3 P5P2 P4 P8P6 P7P1 P5P3 P7P2 P4 P8P6
Fully Connected Topology (1)
P1
P3
P5
P2
P4
P6
Fully Connected Topology (2)
Hardware cost p·(p – 1)/2
One-to-one 1
One-to-all 2log pAll-to-all 2·2log p
Problems Hardware cost increases
quadratically with respect to p
Ring Topology (1)
P1
P3
P5
P2
P4
P6
Ring Topology (2)
Hardware cost pOne-to-one p / 2One-to-all p / 2All-to-all 2·p / 2
Problems Processors are loaded withtransport jobs. But hardwareand communication cost
low
2D-Mesh Topology (1)
P1 P3
P5
P2P0
P6 P7P4
P9 P11
P13
P10P8
P14 P15P12
2D-Mesh Topology (2)
Hardware cost 2·(p)·(p – 1)One-to-one 2(p – 1)One-to-all 2(p – 1)All-to-all 2·p / 2
Remarks Scalable both in network andcommunication cost. Can befound in many architectures.
Mesh – All-to-all
P1 P2 P4P3 P5
1 2 3 4 5
Step 0Step 1
P1 P2 P4P3 P5
1 2 3 4 5
Step 1
P1 P2 P4P3 P5
1 1 2 3 4 5 5
Step 2
P1 P2 P4P3 P5
1 1 2 3 4 5 5
Step 2
P1 P2 P4P3 P5
1 1 2 3 4 5 51 2
Step 3
P1 P2 P4P3 P5
1 1 2 3 4 5 51 2
Step 3
P1 P2 P4P3 P5
1 1 2 3 4 5 51 2 4 5
Step 4
P1 P2 P4P3 P5
1 1 2 3 4 5 51 2 4 5
Step 4
P1 P2 P4P3 P5
1 1 2 3 4 5 51 2 4 53 4 5
Step 5
P1 P2 P4P3 P5
1 1 2 3 4 5 51 2 4 53 4 5
Step 5
P1 P2 P4P3 P5
1 1 2 3 4 5 51 2 4 53 4 52 3 4 5 1 2 3
Step 6
P1 P2 P4P3 P5
1 1 2 3 4 5 51 2 4 53 4 52 3 4 5 1 2 3
Step 6
P1 P2 P4P3 P5
1 1 2 3 4 5 51 2 4 53 4 52 3 4 5 1 2 3 1 2 3 4
2D Wrap-around Mesh (1)
P1 P3
P5
P2P0
P6 P7P4
P9 P11
P13
P10P8
P14 P15P12
2D Wrap-around Mesh (2)
Hardware cost 2pOne-to-one 2·p / 2One-to-all 2· p / 2All-to-all 4· p / 2
Remarks Scalable both in network andcommunication cost. Can befound in many architectures.
2D Wrap-around – One-to-all
P1 P3
P5
P2P0
P6 P7P4
P9 P11
P13
P10P8
P14 P15P12
P0 P1 P3P2
P5 P6 P7P4
P9
P13
P10P8
P14 P15P12
P11
Hypercube Topology (1)
4D
3D
2D
1D
Hypercube Construction
1DHypercube
2DHypercube
3DHypercube
4DHypercube
Hypercube Topology (2)
Hardware cost (p / 2) · 2log p
One-to-one 2log p
One-to-all 2log p
All-to-all 2 · 2log p
Remarks The most elegant design, also
when it comes down to routing
algorithms. But difficult to build
in hardware.
4D Hypercube – One-to-all
Communication Issues
There are some things left to be said about
communication :
• In general the time required for data transmission is of the form startup-time + transfer speed * package size. So it is more efficient to sent one large package instead of many small packages (startup-time can be high, e.g. think about internet)
• Asynchronous vs. Synchronous transfer and deadlocks.
Asynchronous vs. Synchronous
The major difference between asynchronous
and synchronous communication is that the
first methods sends a message and continues,
while the second sends a message and waits
for the receiver program to receive the
message.
Example asynchronous comm.
Processor A Processor B
Send to B
Instruction A2
Instruction A3
Instruction A4
Instruction A5
Instruction B2
Instruction B3
Instruction B4
Receive from A
Instruction B5
Instruction A1 Instruction B1Instruction A1
Instruction A2
Instruction A3
Instruction A4
Instruction A5
Instruction B1
Instruction B3
Instruction B4
Receive from A
Instruction B5
Send to B Instruction B2
Instruction A1
Send to B
Instruction A3
Instruction A4
Instruction A5
Instruction B1
Instruction B2
Instruction B4
Receive from A
Instruction B5
Instruction A2 Instruction B3
Instruction A1
Send to B
Instruction A2
Instruction A4
Instruction A5
Instruction B1
Instruction B2
Instruction B3
Receive from A
Instruction B5
Instruction A3 Instruction B4
Instruction A1
Send to B
Instruction A2
Instruction A3
Instruction A5
Instruction B1
Instruction B2
Instruction B3
Instruction B4
Instruction B5
Instruction A4 Receive from A
Instruction A1
Send to B
Instruction A2
Instruction A3
Instruction A4
Instruction B1
Instruction B2
Instruction B3
Instruction B4
Receive from A
Instruction A5 Instruction B5
Asynchronous Comm.
• Convenient because processors do not have to wait for each other.
• However, we often need to know whether or not the destination processors has received the data, this often requires some checking code later in the program.
• Need to know whether the OS supports reliable communication layers.
• Receive instruction may or may not be blocking.
Example synchronous comm.
Processor A Processor B
Send to B
Instruction A2
Instruction A3
Instruction A4
Instruction A5
Instruction B2
Instruction B3
Instruction B4
Receive from A
Instruction B5
Instruction A1 Instruction B1Instruction A1
Instruction A2
Instruction A3
Instruction A4
Instruction A5
Instruction B1
Instruction B3
Instruction B4
Receive from A
Instruction B5
Send to B Instruction B2
Instruction B1
Instruction B2
Instruction B4
Receive from A
Instruction B5
Instruction B3
Instruction B1
Instruction B2
Instruction B3
Receive from A
Instruction B5
Instruction B4
Instruction B1
Instruction B2
Instruction B3
Instruction B4
Instruction B5
Receive from A
Instruction A1
Send to B
Instruction A3
Instruction A4
Instruction A5
Instruction B1
Instruction B2
Instruction B3
Instruction B4
Receive from A
Instruction A2
Instruction B5
Synchronous comm.
• Both send and receive are blocking
• Processors have to wait for each other. This reduces efficiency.
• Implicitly offers a synchronisation point.
• Easy to program because fewer unexpected situations can arise.
• Problem: Deadlocks may occur.
Deadlocks (1)
A deadlock is a situation where two or more
processors are waiting to for each other
infinitely.
Deadlocks (2)
Processor A Processor B
Send to B
Receive from B
Send to A
Receive from A
Note: Only occurs with synchronous communication.
Deadlocks (3)
Processor A Processor B
Send to B
Receive from B
Receive from A
Send to A
Deadlocks (4)
P1
P3
P5
P2
P4
P6
Pattern: P1 P2 P5 P4 P6 P3 P1
End of Part I
Are there any questions regarding part I