analysis of a packet switch with memories running slower than the line rate
DESCRIPTION
Analysis of a Packet Switch with Memories Running Slower than the Line Rate. Sundar Iyer, Amr Awadallah, Nick McKeown (sundaes,aaa,nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University http://klamath.stanford.edu/pps. Problem Statement. - PowerPoint PPT PresentationTRANSCRIPT
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
Analysis of a Packet Switch with Memories Running Slower
than the Line Rate
Sundar Iyer, Amr Awadallah, Nick McKeown(sundaes,aaa,nickm)@stanford.eduDepartments of Electrical Engineering & Computer Science, Stanford Universityhttp://klamath.stanford.edu/pps
Stanford University 2
Problem Statement
Motivation:
To design an extremely high speed packet switch with memories running slower than the line rate.
Stanford University 3
Architecture of a PPS
OQ Switch
OQ Switch
OQ Switch
1
2
3
N=4
R
R
R
R
1
2
3
N=4
R
R
R
R
MultiplexorDemultiplexor
Demultiplexor
Demultiplexor
Demultiplexor
Multiplexor
Multiplexor
Multiplexor
(R/k) (R/k)
k=3
1
2
(R/k) (R/k)
Stanford University 4
Parallel Packet SwitchQuestions
1. Can it behave like a single big output queued switch?
2. Can it provide delay guarantees, strict-priorities, WFQ, …?
Stanford University 5
Parallel Packet SwitchResults
• If S > 2k/(k+2) 2 then a PPS can precisely emulate a FIFO output queued switch for all traffic patterns.
• If S > 3k/(k+3) 3 then a PPS can precisely emulate an OQ switch with WFQ or strict priorities for all traffic patterns.
Stanford University 6
Parallel Packet SwitchResults
• If S > 2sqrt(N) then a PPS can precisely emulate a multicast FIFO OQ switch
• If S > 2sqrt(2N) then a PPS can precisely emulate a multicast OQ switch with WFQ or strict priorities for all traffic patterns.
Stanford University 7
Questions
• Can we have a completely distributed algorithm?
• Can we reduce the speedup further?– “Two is too much”
• Can we smoothen the load on all the middle stage switches?
Stanford University 8
Completely Distributed Algorithm
• Local Available Output Link Set (LAOL)• Definition:
– LAOL consists of the (k/s -1) “oldest” layers used by an input for that output.
• We can prevent a layer from appearing in the LAOL till another k -k/s +1 cells have been sent to other layers for that output.
• Result :– For any given output a layer is used only after k -k/s
+1 cells to that output are sent .
Stanford University 9
Conflict Free Ordering
33. . . 6 5sR/k
13. . .4 4sR/k
24. . .6 5sR/k
5.6. . . 7. 3 sR/k
Parallel Packet Switch
11. . . 5 .1sR/k
22. . 4 2.sR/k
1
2
3
N=4
R
R
R
R
Demultiplexor
Demultiplexor
Demultiplexor
Demultiplexor
2R
Demultiplexor
Stanford University 10
Re-Sequencing
• A cell might be delayed by as much as N/S time slots.
• Cells might leave in a wrong order.
• A buffer of size Nk/S will be needed to re-sequence cells to prevent out of order transmissions.
Stanford University 11
A Practical Distributed Algorithm
• If S > 2k/(k+2) 2 then a PPS with a completely distributed algorithm can precisely emulate a FIFO output queued switch for all traffic patterns. The PPS will have a fixed latency of Nk/S time slots. A re-sequencing buffer of size Nk/S is needed.
Stanford University 12
PPS with no Speedup
• Speedup = 1– LAOL is round robin– |LAOL| = 1
• D(i,l): Number of cells sent by demultiplexor i to layer l
Stanford University 13
Buffer Degree
• Degree of Buffer ()
sR/k
sR/k
sR/k
sR/k
sR/k
sR/k
a
b
d
c
cR
e
a
b
d
Demultiplexor
c
Stanford University 14
Buffered AIL Set (BAIL)
• Buffered Available Input Link Set (BAIL)
– “Set of layers which have less than cells in the buffer (including transmission) for layer l”
– It is the set of layers which can start sending the arriving cell between time n and n + k”
– Till now we have only considered a PPS with =0
Stanford University 15
Claim
• BAIL is never empty
– The buffer never overflows for some – LAOL is always satisfied
Stanford University 16
Buffer Occupancy Sequence
… 1 2
i-1 i =0
t t2 ti-1 tit-k+1 t1
• The last of the i cells left at least by time t-k+1.
I >= (t-k+1– ti)/k >= (t- ti)/k - 1
• D(i,l) = I +
c
Stanford University 17
Buffer Occupancy Sequence..
… 1 2
i-1 i =0
t t2 ti-1 tit-k+1 t1
c
= N gives a contradiction.
Stanford University 18
Observations
• Each cell reaches the middle stage switch with a variable input delay, Di = 1..N.
• If all cells are delayed at the input of the middle stage switches by “N - Di” then they all reach the outputs of the middle stage in order.
Stanford University 19
Symmetry Argument
• Demultiplexors– Cells arrive at rate R– Each cell has a property: output– Cells to same output are written in a
round robin manner– Cells leave at link rate R– The buffer is used to prevent temporary
load on the same middle stage switch– Max Delay = N
Stanford University 20
Symmetry Argument …
• Multiplexors– Cells need to be read in at rate R– Each cell has a property: input– Cells from same input are read in a
round robin manner– Cells leave at a rate k(R/k) = R– The buffer is used to re-order cells and
send them in a correct order.– Max Delay = N
Stanford University 21
Buffered PPSResults
• A PPS with a completely distributed algorithm and no speedup with a buffer degree N, can precisely emulate a FIFO output queued switch for all traffic patterns within a delay bound of 2N time slots.
Stanford University 22
Conclusions
– Implementation
• Timestamps• Sequence Numbers
– Open questions
• Making QoS practical.• Making multicasting practical.