the nox router

21
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti

Upload: hollye

Post on 24-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

The NoX Router. Mitchell Hayenga Mikko Lipasti. Overview. New low-latency router technique Don’t arbitrate or speculate! Encode. XOR Property (A^B) ^ B = A Hides arbitration latency Eliminates dead cycles The NoX Router Single-cycle/wormhole/mesh implementation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The  NoX  Router

Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE

The NoX Router

Mitchell Hayenga

Mikko Lipasti

Page 2: The  NoX  Router

2/19The NoX Router, Micro’11

Overview• New low-latency router technique

– Don’t arbitrate or speculate! Encode.• XOR Property (A^B) ^ B = A

– Hides arbitration latency– Eliminates dead cycles

• The NoX Router– Single-cycle/wormhole/mesh implementation– Frequency competitive with pure speculative– 2.7%-34.4% better ED2 on application traces– Up to 9.9% better throughput on synthetic traffic

Control

Input Channel

SwitchFabric

Page 3: The  NoX  Router

3/19The NoX Router, Micro’11

Motivation• Modern On-Chip Networks

– Bandwidth Plentiful, Latency Critical– Control

• Complex, Speculative, Critical Path

– Datapath• Fast, Simple, Wire-Dominated

• NoX Tradeoff– Marginal increase in datapath complexity– Hide control latency

Intel Teraflops Router

LTBWNRC VA SA ST

LTRC VA SA STBW

LTBWNRC

VASA ST

LTVA

NRCSA

ST

Virtual Channel Router Pipeline Evolution

Page 4: The  NoX  Router

4/19The NoX Router, Micro’11

Switch Arbitration Techniques• Non-Speculative

– Arbitration occurs before switch traversal

• Speculative Switch Traversal [Mullins ISCA 2004]– Assume contention doesn’t happen– Wasted cycle in the event of contention

• Arbiter decides what gets sent on the next cycle

SwitchFabric

Control

B

AA

clkport 0port 1grantvalid outdata out

0 1 4cycle 2 3

A

p0

A

ABp1

???

B

AA ?

B

A

p0

B A

A

BA

No Contention Contention

B WinsA Wins

Page 5: The  NoX  Router

5/19The NoX Router, Micro’11

Switch Arbitration Techniques• Non-Speculative

– Arbitration occurs before switch traversal

• Speculative Switch Traversal [Mullins ISCA 2004]– Assume contention doesn’t happen– Wasted cycle in the event of contention

• Arbiter decides what gets sent on the next cycle

• Encoding– Blindly transmit, XOR within switch fabric– No contention - data sent unmodified– Contention - data sent XOR’d

• Arbiter decides what was sent

SwitchFabric

Control

B

A

B

AA A^BA

0 1 4cycle 2 3clkport 0port 1grantvalid outdata out

A

p0

A

ABp1

B^A

A

A

A

No Contention Contention

B Wins

Page 6: The  NoX  Router

6/19The NoX Router, Micro’11

Coded

Flit Buffer

AA

^B^C

B^C C

Receive Logic• Works upon simple XOR property.

– (A^B^C) ^ (B^C) = A

• Simple Decode– Always able to decode by XORing two sequential values– Maintains previous router’s arbitration order/fairness

A

0

0

B^C

1

A^B

^C

C B^C

B

Page 7: The  NoX  Router

7/19The NoX Router, Micro’11

Tradeoffs and Scaling• Arbitration

– O(log n) delay for most arbiters

• Decode logic– Constant with respect to # of ports

• Switch Fabric– XOR delay scales slightly worse than a

mux/tristate-based solution– Maybe not an issue (control latency)

Control

Input Channel

SwitchFabricSwitchFabric

Page 8: The  NoX  Router

8/19The NoX Router, Micro’11

The NoX Router• Network of XORs• Implementation Details

– 8x8 Mesh, 2mm long 64-bit links– Single Cycle (Router+Link)– Wormhole– Dimension ordered routing– Minimally buffered

Page 9: The  NoX  Router

9/19The NoX Router, Micro’11

Baseline Designs• Non-Speculative

– Serial arbitration & switch logic– Long cycle time– Efficient link utilization

• Speculative Techniques [Mullins ISCA 2004]– Hides arbitration latency– Potential for wasted link bandwidth– Spec-Fast & Spec-Accurate [Mullins ASP-DAC 2006]

Page 10: The  NoX  Router

10/19The NoX Router, Micro’11

Frequency Analysis• Overheads present in all designs

– 248ps SRAM delay– 98ps link latency

Architecture Clock Period %Non-Speculative 0.92 ns -Spec-Fast 0.69 ns 33.3%Spec-Accurate 0.72 ns 27.7%NoX 0.76 ns 21.1%

Page 11: The  NoX  Router

11/19The NoX Router, Micro’11

Synthetic Traffic - Latency

bandwidth (MB/s/node) bandwidth (MB/s/node)

Page 12: The  NoX  Router

12/19The NoX Router, Micro’11

Synthetic Traffic – ED2

bandwidth (MB/s/node) bandwidth (MB/s/node)

Page 13: The  NoX  Router

13/19The NoX Router, Micro’11

Application Traffic - Latency

Page 14: The  NoX  Router

14/19The NoX Router, Micro’11

Application Traffic – ED2

Page 15: The  NoX  Router

15/19The NoX Router, Micro’11

Power @ Fixed Bandwidth• Traffic Pattern

– Uniform Random– 2GB/s/node injection rate

• Spec-Fast saturated

• Switch/Link glitching in speculative

• Marginal additional decode power

Decodenegligible

Page 16: The  NoX  Router

16/19The NoX Router, Micro’11

Area Floorplanning

Standard Router NoX Router

Por

t 0 –

64x

4 S

RA

MP

ort 1

– 6

4x4

SR

AM

Por

t 2 –

64x

4 S

RA

MP

ort 3

– 6

4x4

SR

AM

Por

t 4 –

64x

4 S

RA

M

Crossbar

Dec

odin

g an

d M

aski

ng

140

µm

70 µm 101.0 µm

161.

2 µm

Por

t 0 –

64x

4 S

RA

MP

ort 1

– 6

4x4

SR

AM

Por

t 2 –

64x

4 S

RA

MP

ort 3

– 6

4x4

SR

AM

Por

t 4 –

64x

4 S

RA

M

140

µm

70 µm

XORSwitch

102.2 µm

161.

2 µm

28 µm

~17% More Area

Page 17: The  NoX  Router

17/19The NoX Router, Micro’11

Going Further• Input Speedup

– What if we could drive two values from an input buffer in a single cycle

– Final decode step has 2 values available• Last packet sees no additional delay

from contention at the previous router

• Multi-hop encoded forwarding– Don’t decode @ every hop, decode

when packets diverge– Allow new collisions with the “head” flit– Requires additional sideband info

SwitchFabric

Flit Buffer

A^B

B

AB

Page 18: The  NoX  Router

18/19The NoX Router, Micro’11

Conclusion• New encoding-based low-latency router technique

– Hides arbitration latency– Comparable frequency to speculative switch traversal techniques– Eliminates wasted interconnect bandwidth– Promising application to multiple router architectures

Page 19: The  NoX  Router

19/19The NoX Router, Micro’11

Thanks – Questions?

Page 20: The  NoX  Router

20/19The NoX Router, Micro’11

Virtual Channels• Future Work• Physical Channels vs. Virtual Channels

– VC Router Benefits Dynamic bandwidth sharing (performance)

– VC Router Negatives Increased arbitration delay (performance) Increased buffer energy (power) Large unified crossbar (area, power)

• Possible but tradeoffs need to be re-evaluated– Structuring of input buffers/decode logic– VC credit accounting

Page 21: The  NoX  Router

21/19The NoX Router, Micro’11

Multi-Flit Support• Current support is conservative

– Performs similarly to speculative routers if multi-flit packets collide– Not all bad though

• ~70% of packets are single-flit coherence packets• Only head-flit collisions matter• Requests all single-flit

• Alternatives– Fragment multi-flit packets– Provide sufficient buffering space