module 13 interconnect-centric soc platform architecture 조준동 교수 ( 성균관대학교 )

57
Module 13 Interconnect-Centr ic SoC Platform Ar chitecture 조조조 조조 ( 조조조조조조 )

Post on 19-Dec-2015

225 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

Module 13Interconnect-Centric SoC Platform Architecture

조준동 교수

( 성균관대학교 )

Page 2: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

2Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

목차 Introduction to network On-chip interconnection network

Network topologies Switching Routing Buffer scheme Congestion Control Protocol

layer On-Chip network examples

Page 3: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

3Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Networks Goal: Communication between computers Eventual Goal: treat collection of computers

as if one big computer, distributed resource sharing

Theme: Different computers must agree on many things Overriding importance of standards and protocols Fault tolerance critical as well

Warning: Terminology-rich environment

Page 4: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

4Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Interconnections (Networks) Examples :

Wide Area Network (ATM): 100-1000s nodes; ~ 5,000 kilometers Local Area Networks (Ethernet): 10-1000 nodes; ~ 1-2 kilometers System/Storage Area Networks (FC-AL): 10-100s nodes;

~ 0.025 to 0.1 kilometers per link

a.k.a.network,communication subnet

a.k.a.end systems,hosts

Interconnect

HW Interface

SW Interface

Link

Node

HW Interface

SW Interface

Link

Node

HW Interface

SW Interface

Link

Node

HW Interface

SW Interface

Link

Node

Interconnection Network

Page 5: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

5Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

HW Interface Issues Where to connect network to computer?

Cache consistent to avoid flushes? (=> memory bus) Latency and bandwidth? (=> memory bus) Standard interface card? (=> I/O bus) MPP => memory bus; LAN, WAN => I/O bus

$

CPU

L2 $

Memory Bus

Memory Bus Adaptor

I/O bus

I/OController

I/OController

Networkideal: high bandwidth, low latency, standard interface

Network

Page 6: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

6Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

SW Interface Issues How to connect network to software?

Programmed I/O?(low latency) DMA? (best for large messages) Receiver interrupted or received polls?

Things to avoid Invoking operating system in common case Operating at uncached memory speed

(e.g., check status of network interface)

Page 7: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

7Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Connecting Multiple Computers Shared Media vs. Switched:

pairs communicate at same time: “point-to-point” connections

Aggregate BW in switched network is many times shared point-to-point faster since no

arbitration, simpler interface Arbitration in Shared

network? Central arbiter for LAN? Listen to check if being used

(“Carrier Sensing”) Listen to check if collision

(“Collision Detection”) Random resend to avoid repeated

collisions; not fair arbitration; OK if low utilization

(A. K. A. data switching interchanges, multistageinterconnection networks,interface message processors)

Switch

Node Node Node

Shared Media (Ethernet)

Node Node

Node Node

Switched Media (CM-5,ATM)

Page 8: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

8Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Networks Background Facets people talk a lot about:

direct (point-to-point) vs. indirect (multi-hop) topology (e.g., bus, ring, DAG)

How switches and nodes are connected routing algorithms

Determines the route from source to destination switching (aka multiplexing)

How a message traverse the route Flow control

Schedules the traversal of the message over time What really matters:

latency bandwidth cost reliability

Page 9: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

9Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

More Network Background Connection of 2 or more networks:

Internetworking 3 cultures for 3 classes of networks

WAN: telecommunications, Internet LAN: PC, workstations, servers cost SAN: Clusters, RAID boxes: latency (System A.N.) or

bandwidth (Storage A.N.) Try for single terminology Motivate the interconnection complexity

incrementally

Page 10: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

10Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Network Performance Measures

Overhead: latency of interface vs. Latency: network

Interconnect

HW Interface

SW Interface

Link

Overhead

LinkBandwidth

Node

HW Interface

SW Interface

Link

Overhead

LinkBandwidth

Node

Bisection Bandwidth

Latency

Page 11: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

11Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Latency

Time(n) = Admission + Channel Occupancy + Routing Delay + Contention Delay Admission

is the time it takes to emit the message into the network. Channel Occupancy

is the time a channel is occupied. Routing Delay

is the delay for the route. Contention Delay

is the delay of a message due to contention.

Page 12: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

12Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Local and Global Bandwidth

b ... raw bandwidth of a link; n ... message size; nE ... size of message envelope; w ... link bandwidth per cycle;

E

nLocal Bandwidth b

n n w

/ sec / /Total Bandwidth Cb bits Cw bits cycle C phits cycle

minimum bandwidth to cut the net into two equal partsBidirectional Bandwidth

Δ ... switching time for each switch in cycles;

wΔ ... bandwidth lost during switching;

C ... total number of channels;

For a kⅹk mesh with bidirectional channels:

Total bandwidth = (4k2 4k)b −Bisection bandwidth = 2kb

Page 13: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

13Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Overhead, BW, Size

Msg Size

Delivered BW

•How big are real messages?

0

0

1

10

100

1,000

Message Size (bytes)

Effe

ctiv

e B

andw

idth

(Mbi

t/sec

) o1,bw1000

o1,bw100

o1,bw10

o500,bw10

o500,bw100

o500,bw1000

o25,bw100

o25,bw1000

o25,bw10

Page 14: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

14Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Basic Definitions Message

is the basic communication entity. Flit

is the basic flow control unit. A message consists of 1 or many flits. Phit

is the basic unit of the physical layer. Direct network

is a network where each switch connects to a node. Indirect network

is a network with switches not connected to any node. Hop

is the basic communication action from node to switch or from switch to switch. Diameter

is the length of the maximum shortest path between any two nodes measured in hops. Routing distance between two nodes

is the number of hops on a route. Average distance

is the average of the routing distance over all pairs of nodes.

Page 15: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

15Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Network design issues Topologies Buffering

Input buffers Output buffers Shared buffers

Routing Source routing Deterministic routing Adaptive routing

Deadlock Control flow Protocol

Page 16: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

16Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Network Topology Structure of the interconnect Determines

Degree: number of links from a node Diameter: max number of links crossed between nodes Average distance: number of hops to random destination Bisection: minimum number of links that separate the

network into two halves (worst case) Warning: these three-dimensional drawings

must be mapped onto chips and boards which are essentially two-dimensional media Elegant when sketched on the blackboard may look

awkward when constructed from chips, cables, boards, and boxes (largely 2D)

Networks should not be interesting!

Page 17: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

17Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Centralized switch:Fully Connected Networks Crossbar and Omega

network

P0

P1

P2

P3

P4

P5

P6

P7

P0

P1

P2

P3

P4

P5

P6

P7

A

B

C

D

Omega network switch box

Page 18: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

18Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Distributed switch Linear Array, Rings, Multidimensional Meshes and Tori

P0 P1 P2 P3

P0 P1 P2 P3

Linear Array

Torus

P0 P1

P2P3

P4 P5

P6 P7

P0 P1 P2

P3 P4 P5

P6 P7 P8

P0 P1 P2

P3 P4 P5

P6 P7 P8

2-d mesh

2-d torus3-d cube

Page 19: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

19Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Distributed switch-cont’d K-ary Tree and k-ary n-dimensional Fat Trees

P0

P1 P2

P3 P4 P5 P6

P0 P1 P2 P3

P4 P5 P6 P7

P8 P9 P10 P11

K-ary Tree 8-node 2-ary fat tree

0

1

2

3Fat node

Page 20: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

20Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Distributed switch-cont’d Butterflies

Multistage: nodes at ends, switches in middle

N/2Butterfly

P0

P1

P2

P3

N/2Butterfly

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

16-node butterfly

Page 21: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

21Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Butterflies All paths equal length Unique path from any input to any output Conflicts that try to avoid Don’t want algorithm to have to know paths Cost of the network

O(N logN) 2-d layout is more difficult than for binary trees Number of long wires grows faster than for trees.

For each source-destination pair there is only one route. Each route blocks many other routes.

Page 22: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

22Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Characteristics of topologies

Number of Nodes

Switch degree

diameter

distance

Network cost

Total bandwidth

Bisection bandwidth

Bus N N 1 1 O(N) b b

crossbar N N 1 1 O(N2) Nb Nb

Linear Array N 2 N-1 ~2/3N O(N) 2(N-1)b 2b

Torus N 2 N/2 ~1/3N O(N) 2Nb 4b

K-ary d-cube Kd d d(k-1)~d½(k-1)

O(N) dNb 2k(d-1)b

K-ary Tree Kd (# of switch: ~ Kd) K+1 2d ~d+2 O(N)2·2(N-1)b

kb

bufferfly 2d (# of switch: ~ 2d-

1d) 2 d+1 d+1 O(Nd) 2ddb (N/2)b

k-ary n-D Fat Tree

Kd (# of switch: ~ Kd-1d)2k 2d ~d O(Nd) 4kddb 2kd-1b

Page 23: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

23Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Projection of multi-d and k-ary Efficient and regular 2-

layout; 2-ary Tree Longest

wires in resource width:11

22d

lW

P P

P P

2-ary 2-cube

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

P P

2-ary 4-cube

2-ary 5-cube

2-ary 3-cube

P P

P P

P P

P P

P

P P

P

PP P

P

P P

PP P

P P

2-ary Tree

Page 24: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

24Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Topologies of Super Computers

Institution Name# of

nodesBasic

topologyData bits/

link

Networkclock rate

(Mhz)

PeakBW/link

(MB/sec)

Bisection(MB/sec)

Year

Delta 540 2D mesh 16 40 40 640 1991

ThinkingMachines

Intel

CM-532 to2048

Multistagefat tree

4 40 20 10240 1991

Intel Paragon 4 to 2048 2D mesh 16 100 175 6400 1992

IBM SP-2 2 to 512 8 40 40 20480 1993Multistage

fat treeCray

ResearchT2E

16 to2048

16 300? 600 122000 19973D torus

Intel ASCI Red4536(x2CPUs)

800 51600 19962D mesh

IBMASCI Blue

Pacific1336(x4CPUs)

150

SGI 800 200xnodes 1998Fat Multi-D

cubeASCI BlueMountain

1464(x2CPUs)

IBM 115 1999Multistage

OmegaASCI Blue

Horizon144(x8CPUs)

IBM 500 2000Multistage

OmegaSP 1~512(x2~

16 CPUs)

IBM 500 2001Multistage

OmegaASCI White

484(x16CPUs)

Page 25: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

25Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Switching Techniques Circuit Switching

A real or virtual circuit establishes a direct connection between source and destination.

Packet Switching Each packet of a message is routed independently. The

destination address has to be provided with each packet. Store and Forward Packet Switching

The entire packet is stored and then forwarded at each switch. Cut Through Packet Switching

The flits of a packet are pipelined through the network. The packet is not completely buffered in each switch.

Virtual Cut Through Packet Switching The entire packet is stored in a switch only when the header

flit is blocked due to congestion. Wormhole Switching

is cut through switching and all flits are blocked on the spot when the header flit is blocked.

Page 26: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

26Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Connection-Based vs. Connectionless Telephone: operator sets up connection

between the caller and the receiver Once the connection is established, conversation can

continue for hours Share transmission lines over long distances

by using switches to multiplex several conversations on the same lines “Time division multiplexing” divide B/W transmission line

into a fixed number of slots, with each slot assigned to a conversation

Problem: lines busy based on number of conversations, not amount of information sent

Advantage: reserved bandwidth

Page 27: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

27Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Connection-Based vs. Connectionless Connectionless: every package of information

must have an address => packets Each package is routed to its destination by looking at

its address Analogy, the postal system (sending a letter) also called “Statistical multiplexing” Note: “Split phase buses” are sending packets

Page 28: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

28Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Switch Design

Control Lines

I0

I1

I2

I3

O0

O1

O2

O3

I0 I1 I2 I3

O0

O1

O2

O3

RAMAddress

I0I1I2I3

O0O1O2O3

Page 29: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

29Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Switch Scaling Switches are wire dominated Scaling with equal switch size by a factor s:

number of transistors: s2

number of I/O's: s speed of transistor: 1/s speed of wires: 1 delay of the switch: 1

Scaling with equal I/O number by factor s: size of the switch: 1/s2

speed of transistor: 1/s speed of wires: 1/s delay of the switch: 1/s

Page 30: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

30Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Stacked Dimension Switch Simple 2ⅹ2 building block can

be repeatedly used; in k-ary n-cubes we need n 2ⅹ

2 switches instead of nⅹn switches; Changing dimension incurs add

itional delay; Switching is very fast in the sa

me dimension;

2x2

2x2

2x2

Host

Host

Zin

Yin

Xin

Zout

Yout

Xout

Page 31: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

31Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Routing Shared Media

Broadcast to everyone, but switched Media needs real routing. Virtual Circuit: circuit established from source to destination,

message picks the circuit to follow Arithmetic routing

The destination address of the incoming packet is compared with the address of the switch and the packet is routed accordingly. (relative or absolute addresses)

Source based routing The source determines the route and builds a header with one directive for each

switch. The switches strip off the top directive.

Table-driven routing Switches have routing tables, which can be configured.

Destination-based routing: message specifies destination, switch must pick the path

Deterministic routing The route is determined solely by source and destination locations.

Adaptive routing The route can be adapted by the switches to balance the load.

Randomized routing pick between several good paths to balance network load

Minimal routing allows only shortest paths while non-minimal routing allows even longer paths.

Page 32: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

32Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Deterministic Routing Examples mesh: dimension-order routing

(x1, y1) -> (x2, y2) first x = x2 - x1, then y = y2 - y1,

Multi-D cube: edge-cube routing X = xox1x2 . . .xn -> Y = yoy1y2 . . .yn

R = X xor Y Traverse dimensions of differing add

ress in order tree: common ancestor Deadlock free? 001

000

101

100

010 110

111011

Page 33: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

33Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Deadlock Deadlock

Two or several packets mutually block each other and wait for resources, which can never be free.

Livelock A packet keeps moving throug

h the network but never reaches its destination.

Starvation A packet never gets a resource

because it always looses the competition for that resource (fairness).

Page 34: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

34Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Deadlock Situations Head-on deadlock; Nodes stop receiving packets; Contention for switch buffers can occur with store-and-for

ward, virtual-cut-through and wormhole routing. Wormhole routing is particularly sensible.

Cannot occur in butterflies; Cannot occur in trees or fat trees if upward and downwar

d channels are independent; Dimension order routing is deadlock free on k-ary n-array

s but not on tori with any n ≥ 1.

Page 35: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

35Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Deadlock-free Routing Two main approaches:

Restrict the legal routes; Restrict how resources are allocated;

Number the channel cleverly Construct the channel dependence graph Prove that all legal routes follow a strictly

increasing path in the channel dependence graph.

Page 36: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

36Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Turn-Model Routing What are the minimal routing restrictions to make routing deadlock

free?

Three minimal routing restriction schemes: North-last West-first Negative-first

Allow complex, non-minimal adaptive routes. Unidirectional k-ary n-cubes still need virtual channels.

North-last West-first Negative-first

Page 37: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

37Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Adaptive Routing The switch makes routing decisions based

on the load. Fully adaptive routing allows all shortest

paths. Partial adaptive routing allows only a

subset of the shortest path. Non-minimal adaptive routing allows also

non-minimal paths. Hot-potato routing is non-minimal adaptive

routing without packet buffering.

Page 38: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

38Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Store and Forward vs. Cut-Through Store-and-forward policy: each switch waits for the full packet to arriv

e in switch before sending to the next switch (good for WAN) Cut-through routing or worm hole routing: switch examines the heade

r, decides where to send the message, and then starts forwarding it immediately In worm hole routing, when head of message is blocked, message stays

strung out over the network, potentially blocking other messages (needs only buffer the piece of the packet that is sent between switches). CM-5 uses it, with each switch buffer being 4 bits per port.

Cut through routing lets the tail continue when head is blocked, accordioning the whole message into a single switch. (Requires a buffer large enough to hold the largest packet).

Advantage Latency reduces from function of: number of intermediate switches X by

the size of the packet to time for 1st part of the packet to negotiate the switches + the packet size ?interconnect BW

Page 39: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

39Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Buffering-Input Buffering

Head-of line blocking limits output channel utilization to 60%.

Router

Router

Router

Router

Crossbar

Scheduler

Inputports

Outputports

Page 40: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

40Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Buffering-Output Buffering

Router

Router

Router

Router

Control

Inputports

Outputports

Inputports

Inputports

Inputports

Page 41: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

41Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Buffering-Shared Buffer Pool Potential better

utilization of buffers; Speed of memory

becomes limiting factor; Inputports

Controller RAM

Router

Router

Router

Router

Page 42: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

42Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Buffering-Virtual Channel Buffering

Dynamic virtual channel allocation can increase buffer utilization, reduce head-of-line blocking.

E.g. 256 nodes 2-ary butterfly with wormhole routing; 16 flits per link: no virtual channel: output channel saturation at 25% under random traffic; 16 virtual channels: saturation at 80% traffic load;

Crossbar

Inputports

Outputports

Virtual channels

Page 43: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

43Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Congestion Control Packet switched networks do not reserve bandwidth; this leads to co

ntention (connection based limits input) Solution: prevent packets from entering until contention is reduced

(e.g., freeway on-ramp metering lights) Options:

Packet discarding: If packet arrives at switch and no room in buffer, packet is discarded (e.g., UDP)

Flow control: between pairs of receivers and senders; use feedback to tell sender when allowed to send next packet

Back-pressure: separate wires to tell to stop Window: give original sender right to send N packets before getting permissio

n to send more; overlapslatency of interconnection with overhead to send & receive packet (e.g., TCP), adjustable window

Choke packets: aka rate-based? Each packet received by busy switch in warning state sent back to the source via choke packet. Source reduces traffic to that destination by a fixed % (e.g., ATM)

Page 44: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

44Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Protocols: HW/SW Interface Protocol : Sequence of steps for SW Internetworking: allows computers on independent

and incompatible networks to communicate reliably and efficiently; Enabling technologies: SW standards that allow reliable

communications without reliable networks Hierarchy of SW layers, giving each layer responsibility for

portion of overall communications task, called protocol families or protocol suites

Transmission Control Protocol/Internet Protocol (TCP/IP) This protocol family is the basis of the Internet IP makes best effort to deliver; TCP guarantees delivery TCP/IP used even when communicating locally: NFS uses IP

even though communicating across homogeneous LAN

Page 45: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

45Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Software to Send and Receive SW Send steps

1: Application copies data to OS buffer2: OS calculates checksum, starts timer3: OS sends data to network interface HW and says start

SW Receive steps3: OS copies data from network interface HW to OS buffer2: OS calculates checksum, if matches send ACK; if not,

deletes message (sender resends when timer expires)1: If OK, OS copies data to user address space and signals

application to continue

Page 46: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

46Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Protocol Family Concept

Key to protocol families is that communication occurs logically at the same level of the protocol, called peer-to-peer, but is implemented via services at the lower level

Danger is each level increases latency if implemented as hierarchy (e.g., multiple check sums)

Message TH

Message

TH

Message TH

Message

Message TH Message TH TH

Actual

Actual Actual

Actual

Physical

Logical

Logical

Page 47: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

47Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Protocol Family Concept Key to protocol families is that

communication occurs logically at the same level of the protocol, called peer-to-peer,

but is implemented via services at the next lower level

Encapsulation: carry higher level information within lower level “envelope”

Fragmentation: break packet into multiple smaller packets and reassemble

Danger is each level increases latency if implemented as hierarchy (e.g., multiple check sums)

Page 48: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

48Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Connection of Networks Bridges: connect LANs together, passing

traffic from one side to another depending on the addresses in the packet. operate at the Ethernet protocol level usually simpler and cheaper than routers

Routers or Gateways: these devices connect LANs to WANs or WANs to WANs and resolve incompatible addressing. Generally slower than bridges, they operate at the

internetworking protocol (IP) level Routers divide the interconnect into separate smaller

subnets, which simplifies manageability and improves security

Cisco is major supplier; basically special purpose computers

Page 49: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

49Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Protocol layer Physical Layer

Parameters: Physical distance Number of lines Activity control Buffers and pipelining

Data Link Layer Parameters:

Line frequency versus switch frequency Buffering Error correction Power optimization encoding

Page 50: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

50Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Protocol layer-cont’d Network Layer

Parameters: Link layer cell size vs. network layer packet size Network address scheme Routing algorithm Priority classes Error correction

Network Layer Services Packet switched best effort service

Packets are guaranteed to arrive Packet payload may be protected (4 levels of protection) Load dependable delay in the network Load dependable delay at the network access point

Virtual circuit service Guaranteed bandwidth Guaranteed maximum delay Multicast circuits Based on packet switching service

Page 51: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

51Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

Protocol layer-cont’d Session Layer

Parameters: Task level communication primitives Message passing Shared memory based communication Synchronization Error correction

Application Layers Application specific communication services; E.g. the NoC operating system could use: Task/resource database access protocol Task migration protocol

Page 52: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

52Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

On-Chip network examples Nostrum (KTH, Sweden)

Toplogy:2-d mesh Wide (128 bits), short links Non-minimal adaptive hot-

potato routing No buffering Service: Best effort,

guaranteed latency virtual circuits

Four data protection levels at link layer

S

R

S

R

S

R

S

R

S

R

S

R

S

R

S

R

S S S S

Data Link

Network Interface

Network

Transport

Resource

Page 53: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

53Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

On-Chip network examples Æthereal (Philips)

Topology: probably low dimensional tree or mesh, or dedicated

Deterministic source based routing

Wormhole or virtual cut-through switching

Input buffering Selectable connection features:

Integrity Completion Ordering Bounds on latency, throughput an

d jitter

Best effort

Guaranteedthroughput

arbitration

Best effort

Guaranteedthroughput

Low-priority High-priority

buffers switchData path

preemptprogram

Control path

Conceptual view

Hardware view

Page 54: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

54Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

On-Chip network examples SPIN (UPMC/LIP6) in

Paris Topology: fat tree Wormhole: switching Adaptive routing Input buffering Bidirectional 32 bit links Separate control

network for configuration

Rou

ting

Log

ic

10x10Partial

Crossbar

18-flit shared output buffers

4-flit input buffer

Channelsincomingfrom thefathers

Channelsincomingfrom thechildrens

Channelsoutgoing

to thefathers

Channelsoutgoing

to thechildren

Page 55: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

55Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

On-Chip network examples Octagon at ST

and UC San Diego Basic 8-node

architecture Diameter: 2

hops Packet and

circuit switching modes

Source based routing with a 3 bit address in the header

0

1

2

3

4

5

6

7

P0 M0

P1

M1

P2

M2

P3

M3

P4 M4

P5

M5

P7

M7

Memory Processor

Schedulaer Arbiter

MUX/DEMUXLAR

LAR

RequestGenerator

Octagon NodeModel

Network Processor usingOctagon

Ingr

ess

Eng

ress

Page 56: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

56Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

On-Chip network examples XPipes at Bologna Univ.

Source based routing Wormhole switching Long, pipelined links Output buffering Virtual channels Acknowledgment based flow control with retransmission

Page 57: Module 13 Interconnect-Centric SoC Platform Architecture 조준동 교수 ( 성균관대학교 )

57Copyright 2003ⓒ

Interconnect-Centric SoC Platform Architecture

On-Chip network examples Proteo at Tampere university of

technology Objective is to develop a flexibl

e library of communication IPs that support various topologies and routing strategies

Topology: Ring connecting local bus or star based clusters

Virtual cut-through switching Routing: Table based Buffer: input and output bufferi

ng

IP Interfacerouter

Bridge

Interfacerouter

InterfacerouterIP

IP

Interfacerouter

Sub-network 1Sub-

network 3

Sub-network 2