analysis of a packet switch with memories running slower than the line rate

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

Analysis of a Packet Switch with Memories Running Slower

than the Line Rate

Sundar Iyer, Amr Awadallah, Nick McKeown(sundaes,aaa,nickm)@stanford.eduDepartments of Electrical Engineering & Computer Science, Stanford Universityhttp://klamath.stanford.edu/pps

Stanford University 2

Problem Statement

Motivation:

To design an extremely high speed packet switch with memories running slower than the line rate.


Architecture of a PPS

OQ Switch

OQ Switch

OQ Switch

1

2

3

N=4

R

R

R

R

1

2

3

N=4

R

R

R

R

MultiplexorDemultiplexor

Demultiplexor

Demultiplexor

Demultiplexor

Multiplexor

Multiplexor

Multiplexor

(R/k) (R/k)

k=3

1

2

(R/k) (R/k)


Parallel Packet SwitchQuestions

1. Can it behave like a single big output queued switch?

2. Can it provide delay guarantees, strict-priorities, WFQ, …?


Parallel Packet SwitchResults

• If S > 2k/(k+2) 2 then a PPS can precisely emulate a FIFO output queued switch for all traffic patterns.

• If S > 3k/(k+3) 3 then a PPS can precisely emulate an OQ switch with WFQ or strict priorities for all traffic patterns.


Parallel Packet SwitchResults

• If S > 2sqrt(N) then a PPS can precisely emulate a multicast FIFO OQ switch

• If S > 2sqrt(2N) then a PPS can precisely emulate a multicast OQ switch with WFQ or strict priorities for all traffic patterns.


Questions

• Can we have a completely distributed algorithm?

• Can we reduce the speedup further?– “Two is too much”

• Can we smoothen the load on all the middle stage switches?


Completely Distributed Algorithm

• Local Available Output Link Set (LAOL)• Definition:

– LAOL consists of the (k/s -1) “oldest” layers used by an input for that output.

• We can prevent a layer from appearing in the LAOL till another k -k/s +1 cells have been sent to other layers for that output.

• Result :– For any given output a layer is used only after k -k/s

+1 cells to that output are sent .


Conflict Free Ordering

33. . . 6 5sR/k

13. . .4 4sR/k

24. . .6 5sR/k

5.6. . . 7. 3 sR/k

Parallel Packet Switch

11. . . 5 .1sR/k

22. . 4 2.sR/k

1

2

3

N=4

R

R

R

R

Demultiplexor

Demultiplexor

Demultiplexor

Demultiplexor

2R

Demultiplexor


Re-Sequencing

• A cell might be delayed by as much as N/S time slots.

• Cells might leave in a wrong order.

• A buffer of size Nk/S will be needed to re-sequence cells to prevent out of order transmissions.


A Practical Distributed Algorithm

• If S > 2k/(k+2) 2 then a PPS with a completely distributed algorithm can precisely emulate a FIFO output queued switch for all traffic patterns. The PPS will have a fixed latency of Nk/S time slots. A re-sequencing buffer of size Nk/S is needed.


PPS with no Speedup

• Speedup = 1– LAOL is round robin– |LAOL| = 1

• D(i,l): Number of cells sent by demultiplexor i to layer l


Buffer Degree

• Degree of Buffer ()

sR/k

sR/k

sR/k

sR/k

sR/k

sR/k

a

b

d

c

cR

e

a

b

d

Demultiplexor

c


Buffered AIL Set (BAIL)

• Buffered Available Input Link Set (BAIL)

– “Set of layers which have less than cells in the buffer (including transmission) for layer l”

– It is the set of layers which can start sending the arriving cell between time n and n + k”

– Till now we have only considered a PPS with =0


Claim

• BAIL is never empty

– The buffer never overflows for some – LAOL is always satisfied


Buffer Occupancy Sequence

… 1 2

i-1 i =0

t t2 ti-1 tit-k+1 t1

• The last of the i cells left at least by time t-k+1.

I >= (t-k+1– ti)/k >= (t- ti)/k - 1

• D(i,l) = I +

c


Buffer Occupancy Sequence..

… 1 2

i-1 i =0

t t2 ti-1 tit-k+1 t1

c

= N gives a contradiction.


Observations

• Each cell reaches the middle stage switch with a variable input delay, Di = 1..N.

• If all cells are delayed at the input of the middle stage switches by “N - Di” then they all reach the outputs of the middle stage in order.


Symmetry Argument

• Demultiplexors– Cells arrive at rate R– Each cell has a property: output– Cells to same output are written in a

round robin manner– Cells leave at link rate R– The buffer is used to prevent temporary

load on the same middle stage switch– Max Delay = N


Symmetry Argument …

• Multiplexors– Cells need to be read in at rate R– Each cell has a property: input– Cells from same input are read in a

round robin manner– Cells leave at a rate k(R/k) = R– The buffer is used to re-order cells and

send them in a correct order.– Max Delay = N


Buffered PPSResults

• A PPS with a completely distributed algorithm and no speedup with a buffer degree N, can precisely emulate a FIFO output queued switch for all traffic patterns within a delay bound of 2N time slots.


Conclusions

– Implementation

• Timestamps• Sequence Numbers

– Open questions

• Making QoS practical.• Making multicasting practical.

analysis of a packet switch with memories running slower than the line rate

Documents