techniques for fast packet buffers sundar iyer, ramana rao, nick mckeown (sundaes,ramana,...

Hi gh Pe rf or ma nc eSwi tc hi ng a nd Routi ngTe lec om Ce nter W ork sho p: Sep t 4 , 19 97 .

Techniques for

Fast Packet Buffers

Sundar Iyer, Ramana Rao, Nick McKeown(sundaes,ramana, nickm)@stanford.eduDepartments of Electrical Engineering & Computer Science, Stanford University

Stanford University 2

Problem Statement RedefinedMotivation:

To design an extremely high speed packet buffer architecture with fast access time and large size.

This talk:Is about the analysis of one such well known approach.

Characteristics of Packet Buffer Architectures

• The total throughput needed is at least 2(Ingress Rate)

• Size of Buffer is at least R * RTT

• The buffers have one or more FIFOs

• The sequence in which the FIFOs are accessed is determined by an arbiter and is unknown apriori

Memory Hierarchy of Packet Buffer

ArrivingPackets

DepartingPackets

Large DRAM memory with access time T’

Ingress SRAM Egress SRAM cache of FIFO heads

b cells

R RArbiter

b cells b cells

Write Access Read Access Time = T= 2T’ Time = T = 2T’

Memory Management Algorithm

cache of FIFO tails

grants

System Design ParametersMain Parameters

– SRAM Size– Latency faced by a cell

System Parameters– I/O Bandwidth– Number of addresses

• Use single address on every DRAM• Use different addresses on every DRAM

– Use/Non Use of DRAM Burst Mode– (non) Existence of Bank conflicts

Today’s Talk…

Optimize Main Parameters– Minimize latency at cost of SRAM size – (Necessity and Sufficiency)

…… (later) Minimize SRAM size at cost of Latency

Assumptions on system parameters• No speedup on I/O

– I/O = 2R• Simple address architecture

– Use single address from every DRAM

More Assumptions ..

• We shall assume that we have only cells of size “C” which arrive in the system

• No use of DRAM Burst Mode

• No bank conflicts

Symmetry Argument

• The analysis and working of the ingress and egress buffer architectures are similar

• We shall analyze only the egress buffer architecture

A Bad Case for the Queues …1

t = 0 t = 1 t = 2 t = 3

t = 4 t = 5 t = 6 t = 7

A Bad Case for the Queues … 2

t = 8 t = 9 t = 10 t = 11

t = 12 t = 13 t = 14 … t = 17

Observation

• There exists some value of “w” for which the buffer does not overflow

• w = qb is one such sufficient value• Threshold value “Ti” governs “w”.

wTib -1

Definitions• Occupancy

– This is the number of cells in the SRAM for a particular queue

• Active Queue– An active queue is one which has an

occupancy less than the threshold and has cells in the DRAM present for it

One More Definition • Deficit

– This is defined as the difference between the threshold ‘T’ and the occupancy of an active queue.

– For a queue which is not active the deficit is zero

occupancy

b -1 deficit

Can we Bound the Maximum Value of the Deficit?

• Define f(i,q)– The maximum deficit that a set of “i”

queues can have in a system of “q” queues

• We are interested in f(1,q)

• f(q,q) < qb …. trivially

Largest Deficit Queue First

Recurrence Equations

• f(2,q) >= f(1,q) –b + [f(1,q) –b]• f(3,q) >= f(2,q) –b + [f(2,q) –b]/2• f(4,q) >= f(3,q) –b + [f(3,q) –b]/3• ……• f(q,q) >= f(q-1,q) –b + [f(q-1,q) –b]/(q-

Dirty Math..• qb > f(q,q) … trivially >= [f(q-1,q) –b] + [f(q-1,q) –b]/(q-1) >= f(q-1,q)(q/q-1) – b(q/q-1)

>= {f(q-2,q)(q-1/q-2) –b(q-1/q-2)}(q/q-1) – b[q/q-1]

>= f(q-2,q)q/q-2 –bq/q-2 –bq/q-1 >= f(q-3,q)q/q-3 –bq/q-3 –bq/q-2 - bq/q-1 ….. >= f(1,q) q/1 – bq sigma [1/i]• This gives, f(1,q) <= b[1 + ln q]

Results

• If the MMA services the queue,– with the largest deficit &– has a simple address architecture – and no I/O speedup

• then– A latency of zero can be guaranteed when the – width of the SRAM is b[1 + lnq] + b = b [2 +

ln q]– And the size of SRAM is [2 + lnq]qb

Necessity Traffic Pattern – b=2, q=8

t = 0 t = 8 t = 8 +8/2 t = 8 + 8/2 + 8/4

w w w w

Necessity Analysis … 1

• In 1st iteration – q(b-1/b) queues with deficit 1

• In 2 nd iteration– q(b-1/b)2 queues with deficit 2

• In xth iteration– q(b-1/b)x = 1 queues with deficit x

• X = log (b/b-1) q = ln q/ ln (1 +1/b-1) ; (Use ln (1+x) = x) = ln q(b-1)

Necessity Analysis ….2

• In xth iteration– We can delete another “b”– Deficit is x + b = ln q(b-1) + b = b[ 1 + ln q(b-1)/b] = approx b [1 + lnq]

• Width of SRAM = b [2 + lnq] • Size of SRAM = qb[2 + lnq]

A Dose of Reality• Typical values

– “b” is typically <= 10– q = Np, where

• N = # of ports (for VOQ)• p = number of classes per port

• Implementations– VOQ

• N = 32, p = 1, q = 25, b = 23, SRAM = 700 kb

– Diffserv• N = 32, p = 16, q = 29, b = 23, SRAM = 17 Mb

– Intserv• Lets not think about it!

Future Work

• Discussion on trading off latency for SRAM size

• Analysis of other parameters– Relaxing I/O, address constraints

• Implementation Pain

• …. Still a long way to go

techniques for fast packet buffers sundar iyer, ramana rao, nick mckeown (sundaes,ramana,...

Documents

ramana maharsi tanÍtÁsai - Önvaló€¦ · a...

making parallel packet switches practical sundar iyer, nick...

sri ramana gita - saunalahti.fipentmant/sri ramana...

sri ramana gita - ivanticivantic.info/ramana_maharshi/sri...

patrick mckeown - oxygenadvantage.com

undergraduate dissertation- timothy mckeown

lisa mckeown - laing o'rourke

ramana gyan · by bhagavan sri ramana maharshi. ramana...

world’s fair sundaes - crowncandykitchen.net

nick mckeown - stanford...

ramana ppt

nick mckeown

ch01 ppt mckeown

excludes sundaes, cheezecake and shakes wine ‘n’ …

ramana smruti

mckeown rotation talk

ramana maharshi

recipies for a month of sundaes

mary mckeown

ramana maharsi