techniques for fast packet buffers sundar iyer, ramana rao, nick mckeown (sundaes,ramana,...
Post on 08-Jan-2018
216 Views
Preview:
DESCRIPTION
TRANSCRIPT
Hi gh Pe rf or ma nc eSwi tc hi ng a nd Routi ngTe lec om Ce nter W ork sho p: Sep t 4 , 19 97 .
Techniques for
Fast Packet Buffers
Sundar Iyer, Ramana Rao, Nick McKeown(sundaes,ramana, nickm)@stanford.eduDepartments of Electrical Engineering & Computer Science, Stanford University
Stanford University 2
Problem Statement RedefinedMotivation:
To design an extremely high speed packet buffer architecture with fast access time and large size.
This talk:Is about the analysis of one such well known approach.
Stanford University 3
Characteristics of Packet Buffer Architectures
• The total throughput needed is at least 2(Ingress Rate)
• Size of Buffer is at least R * RTT
• The buffers have one or more FIFOs
• The sequence in which the FIFOs are accessed is determined by an arbiter and is unknown apriori
Stanford University 4
Memory Hierarchy of Packet Buffer
ArrivingPackets
DepartingPackets
Large DRAM memory with access time T’
Ingress SRAM Egress SRAM cache of FIFO heads
1
Q
1
Q
1
Q
b cells
R RArbiter
b cells b cells
Write Access Read Access Time = T= 2T’ Time = T = 2T’
Memory Management Algorithm
cache of FIFO tails
grants
Stanford University 5
System Design ParametersMain Parameters
– SRAM Size– Latency faced by a cell
System Parameters– I/O Bandwidth– Number of addresses
• Use single address on every DRAM• Use different addresses on every DRAM
– Use/Non Use of DRAM Burst Mode– (non) Existence of Bank conflicts
Stanford University 6
Today’s Talk…
Optimize Main Parameters– Minimize latency at cost of SRAM size – (Necessity and Sufficiency)
…… (later) Minimize SRAM size at cost of Latency
Assumptions on system parameters• No speedup on I/O
– I/O = 2R• Simple address architecture
– Use single address from every DRAM
Stanford University 7
More Assumptions ..
• We shall assume that we have only cells of size “C” which arrive in the system
• No use of DRAM Burst Mode
• No bank conflicts
Stanford University 8
Symmetry Argument
• The analysis and working of the ingress and egress buffer architectures are similar
• We shall analyze only the egress buffer architecture
Stanford University 9
A Bad Case for the Queues …1
t = 0 t = 1 t = 2 t = 3
t = 4 t = 5 t = 6 t = 7
w
Stanford University 10
A Bad Case for the Queues … 2
t = 8 t = 9 t = 10 t = 11
t = 12 t = 13 t = 14 … t = 17
Stanford University 11
Observation
• There exists some value of “w” for which the buffer does not overflow
• w = qb is one such sufficient value• Threshold value “Ti” governs “w”.
wTib -1
Q
Stanford University 12
Definitions• Occupancy
– This is the number of cells in the SRAM for a particular queue
• Active Queue– An active queue is one which has an
occupancy less than the threshold and has cells in the DRAM present for it
Stanford University 13
One More Definition • Deficit
– This is defined as the difference between the threshold ‘T’ and the occupancy of an active queue.
– For a queue which is not active the deficit is zero
occupancy
b -1 deficit
Ti
Stanford University 14
Can we Bound the Maximum Value of the Deficit?
• Define f(i,q)– The maximum deficit that a set of “i”
queues can have in a system of “q” queues
• We are interested in f(1,q)
• f(q,q) < qb …. trivially
Stanford University 15
Largest Deficit Queue First
Recurrence Equations
• f(2,q) >= f(1,q) –b + [f(1,q) –b]• f(3,q) >= f(2,q) –b + [f(2,q) –b]/2• f(4,q) >= f(3,q) –b + [f(3,q) –b]/3• ……• f(q,q) >= f(q-1,q) –b + [f(q-1,q) –b]/(q-
1)
Stanford University 16
Dirty Math..• qb > f(q,q) … trivially >= [f(q-1,q) –b] + [f(q-1,q) –b]/(q-1) >= f(q-1,q)(q/q-1) – b(q/q-1)
>= {f(q-2,q)(q-1/q-2) –b(q-1/q-2)}(q/q-1) – b[q/q-1]
>= f(q-2,q)q/q-2 –bq/q-2 –bq/q-1 >= f(q-3,q)q/q-3 –bq/q-3 –bq/q-2 - bq/q-1 ….. >= f(1,q) q/1 – bq sigma [1/i]• This gives, f(1,q) <= b[1 + ln q]
Stanford University 17
Results
• If the MMA services the queue,– with the largest deficit &– has a simple address architecture – and no I/O speedup
• then– A latency of zero can be guaranteed when the – width of the SRAM is b[1 + lnq] + b = b [2 +
ln q]– And the size of SRAM is [2 + lnq]qb
Stanford University 18
Necessity Traffic Pattern – b=2, q=8
t = 0 t = 8 t = 8 +8/2 t = 8 + 8/2 + 8/4
w w w w
Stanford University 19
Necessity Analysis … 1
• In 1st iteration – q(b-1/b) queues with deficit 1
• In 2 nd iteration– q(b-1/b)2 queues with deficit 2
• In xth iteration– q(b-1/b)x = 1 queues with deficit x
• X = log (b/b-1) q = ln q/ ln (1 +1/b-1) ; (Use ln (1+x) = x) = ln q(b-1)
Stanford University 20
Necessity Analysis ….2
• In xth iteration– We can delete another “b”– Deficit is x + b = ln q(b-1) + b = b[ 1 + ln q(b-1)/b] = approx b [1 + lnq]
• Width of SRAM = b [2 + lnq] • Size of SRAM = qb[2 + lnq]
Stanford University 21
A Dose of Reality• Typical values
– “b” is typically <= 10– q = Np, where
• N = # of ports (for VOQ)• p = number of classes per port
• Implementations– VOQ
• N = 32, p = 1, q = 25, b = 23, SRAM = 700 kb
– Diffserv• N = 32, p = 16, q = 29, b = 23, SRAM = 17 Mb
– Intserv• Lets not think about it!
Stanford University 22
Future Work
• Discussion on trading off latency for SRAM size
• Analysis of other parameters– Relaxing I/O, address constraints
• Implementation Pain
• …. Still a long way to go
top related