jon turner [email protected] jst resilient cell resequencing in terabit routers

19
Jon Turner [email protected] http://www.arl.wustl.edu/~jst Resilient Cell Resequencing in Terabit Routers

Upload: matthew-west

Post on 20-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

Jon [email protected]

http://www.arl.wustl.edu/~jst

Resilient Cell Resequencing

in Terabit Routers

Page 2: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

2 - Jonathan Turner

Why Resequencing?

Multistage interconnection networks with buffered switch elements and dynamic routing.»scalable, bandwidth-efficient architecture»makes good use of modern CMOS ICs - single chip can

provide 100 Gb/s thruput and buffer thousands of cells

Load Balancing

Stage

Input Line

Cards

Output Line

Cards

Shared Memory Switch

Elements

Page 3: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

3 - Jonathan Turner

Resequencing with Sequence Numbers

Drawbacks»each output needs N resequencing arrays» initialization when line card comes on-line»timeouts needed to cope with lost cells»multicast requires per flow resequencing arrays

Sequence Numbers

AddedReseq. Arrays

indexed by seq. #

Page 4: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

4 - Jonathan Turner

Time-Based Resequencing

Single resequencing buffer per output. Cells held until “age” exceeds threshold (T ). Options for late cells.

»discard (strict resequencing) or buffer (loose)

Timestamps Added

Timestamp Ordered

Reseq. Buffer

Output Buffer

Page 5: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

5 - Jonathan Turner

Henrion’s Strict Resequencer Design

Implemented using linked lists in common memory. Constant time per cell.

arriving cell ts“ready” cells

T slot timing wheel

nowinsert at timestamp

mod T

current time modulo T

Page 6: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

6 - Jonathan Turner

now

Implementing Loose Resequencing

Only approximates loose resequencing.» must still discard “really late” cells.» low cost of large timing wheel allows good approximation

lagnow

T

arriving cell ts1 ts1+T

late arriving cell ts2 ts2+T

“Normal”

insertion range

“ready”cells

“waiting”

cells

next next

lag

new cell inserted at

timestamp+T

Cannot just insert late cells into output list.

Page 7: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

7 - Jonathan Turner

Fast-Forwarding the Lag Pointer Lag pointer must advance to next non-empty list

on every clock tick for constant time operation.»no time to check successive pointers in timing wheel»use fast-forward bits to speed-up process

0101000101001101000000000000010000000000000000000011000101001101

11010011

summary word

bit for current

time

next non-empty timing

wheel slot

Two memory reads suffice to find next slot.» 32 bit words allows range of 1024» 128 bit words allows range of 16,384

Page 8: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

8 - Jonathan Turner

Synchronization Time-based resequencing requires

synchronization of all line cards.» in small routers, requires just a common backplane

signal» in large routers, line cards connected to network only

by optical data cables Requires low-level clock synchronization protocol.

»“master” line card issues periodic broadcast synchronization messages

»network forwards sync messages with constant delay only approximate synchronization is necessary

»new clock master selected on failure Independent line card clocks require adjustments.

»suspend transmission when delaying clock

Page 9: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

9 - Jonathan Turner

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

0.95 0.96 0.97 0.98 0.99 1.00Input Load

Lat

e Pro

babi

lity

simple random traffic

fixed threshold, T =128

speedup=1.01

1.03

1.05

Performance of Strict Resequencing

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

1.01 1.03 1.05 1.07 1.09Speedup

Lat

e Pro

babi

lity

simple random traffic

fixed threshold

input load = 1.0

T =64

128

256

Simple random traffic. 3 stage network, 8 port SEs, 512 (shared) cell buffers. 1st stage SEs use round robin load balancing for each input.

Late cells rare with

small speedup

For systems with 10G links,

delay for 256 cells is 10 s.

Page 10: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

10 - Jonathan Turner

0

50

100

150

200

250

300

350

400

450

250 500 750 1000 1250 1500 1750 2000

Time

resequencer occupancy

age of oldest cell

network delay

poor performance

overload

agethreshold

fixed, strict, T=128

Performance on Adversarial Traffic

Growing network delay

Cells discarded when delay exceeds T

Delay drops as SE buffers

drain

Resequencer recovers when

delay drops below T

2:1 overload at

“target” output

Page 11: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

11 - Jonathan Turner

Performance on Bursty Traffic

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

1 2 3 4 5 6 7 8 9 10Mean Dwell Time

Lat

e Pr

obab

ility speedup=1.1

1.31.5

bursty traffic

strict, fixed, T =128

100% input load. Input picks an output at random – overload at 1 in 4 outputs Stays with target output for geometrically distributed time.

Poor performance even for large

speedups.

Page 12: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

12 - Jonathan Turner

0

50

100

150

200

250

300

350

400

450

250 500 750 1000 1250 1500 1750 2000

Time

agethreshold

overload

age of oldest cell

networkdelay

resequenceroccupancy

fixed, loose, T=128

Loose Reseq. with Adversarial TrafficArriving cells are

younger than oldest waiting cells, so no resequencing errors

Page 13: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

13 - Jonathan Turner

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

0 5 10 15 20 25 30Mean Dwell Time

Lat

e Pro

babi

lity

speedup=1.1

1.3

1.5

loose,fixed, T =128

bursty traffic

Loose Reseq. with Bursty Traffic

Tolerates about 3x longer bursts than strict reseq.

Page 14: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

14 - Jonathan Turner

Adaptive Resequencing Adjust age threshold to match observed delay.

»parameters: window size (W), short term delay difference bound ()

»variables: max delay in current measurement window (d0)and previous measurement window (d-1)

»age threshold = + max{d0, d-1}

» implement by extending loose, fixed threshold design Theorem. (simplified). If cell c1 enters

interconnection network just before c2 and exits no later than after c2, then adaptive resequencer with W forwards c1 before c2.

Resequencing errors caused by excessive delay variability, rather than large delays.

Page 15: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

15 - Jonathan Turner

0

50

100

150

200

250

300

350

400

450

250 500 750 1000 1250 1500 1750 2000

Time

age threshold

networkdelay

age of oldest cell

resequenceroccupancy

overload adaptive, W==32

Performance on Adversarial Traffic

Arriving cells are younger than oldest waiting cells, so no resequencing errors

Resequencer adds small

increment to network delay.

Age threshold tracks network delay.

Resequencer occupancy stays

bounded.

Page 16: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

16 - Jonathan Turner

Performance on Bursty Traffic

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

0 10 20 30 40 50 60Mean Dwell Time

Lat

e Pro

babi

lity

bursty traffic, adaptive, =64

speedup=1.1

1.3

1.5

late+input loss

Tolerates about 2x longer bursts than loose reseq.

Less sensitive to extremely large bursts

Performance degrades when

switch buffers fill. Caused by delay variation in first

stage.

Page 17: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

17 - Jonathan Turner

Boosting Performance for Long Bursts

1.E-07

1.E-06

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

20 40 60 80 100 120 140 160 180 200

Lat

e Pro

babi

lity

first stage buffer capacity

256128

32

64

bursty traffic, dwell=100, speedup=1.2

adaptive

resequencer

512

Limiting first stage buffering cuts variability.

Small first stage buffer gives good performance with

modest “extra” delay

Smallest buffer reduces switch throughput by

2%.

Page 18: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

18 - Jonathan Turner

Speeding up Threshold Reductions Downward age threshold adjustments are delayed

by window mechanism. Can reduce delay using “finer-grained” windows.

»max delay d0,d-1,d-2, . . . in k measurement windows

»age threshold = + max{d0, d-1,d-2, . . .}

Theorem. (simplified). If cell c1 enters interconnection network just before c2 and exits no later than after c2, then adaptive resequencer with (k1)W forwards c1 before c2.

For large k, max delay in threshold adjustment is cut almost in half.

Page 19: Jon Turner Jon.Turner@wustl.edu jst Resilient Cell Resequencing in Terabit Routers

19 - Jonathan Turner

Closing Remarks Adaptive resequencing can virtually eliminate

resequencing errors in multistage networks.»to handle most extreme traffic, need to limit delay

variability in load balancing stages» in systems that regulate the overall flow of traffic,

extreme cases should not arise at all Henrion’s strict resequencer can be modified for

adaptive resequencing – O(1) time per cell. More sophisticated interconnection network can

increase throughput and reduce delay variability.»per destination queues with controlled buffer sharing»per destination flow control and load balancing»timestamp-ordered switch element queues