scinet caltech-slac experiments

68
SCinet Caltech-SLAC experiments netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002 Prototype C. Jin, D. Wei Theory D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA) Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S. Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J. Navratil, J. Williams StarLight: T. deFanti, L. Winkler Major sponsors ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF Acknowledgments

Upload: kuniko

Post on 15-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

SC2002 Baltimore, Nov 2002. SCinet Caltech-SLAC experiments. Acknowledgments. netlab.caltech.edu/FAST. Prototype C. Jin, D. Wei Theory D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA) Experiment/facilities - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SCinet Caltech-SLAC experiments

SCinet Caltech-SLAC experiments

netlab.caltech.edu/FAST

SC2002 Baltimore, Nov 2002

PrototypeC. Jin, D. Wei

TheoryD. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)

Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S.

Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J.

Navratil, J. Williams StarLight: T. deFanti, L. Winkler

Major sponsorsARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF

Acknowledgments

Page 2: SCinet Caltech-SLAC experiments

FAST Protocols for Ultrascale Networks

netlab.caltech.edu/FAST

Internet: distributed feedback control system TCP: adapts sending rate to congestion AQM: feeds back congestion information

Rf (s)

Rb’(s)

x

))((1

lll

l ctyc

p

)()(1)( tan)(

)()(1-2

tqtttT

wx iid

tqtxi

ii ii

ii

y

pq

TCP AQM

Theory

Calren2/Abilene

Chicago

Amsterdam

CERN

Geneva

SURFNet

StarLight

WAN in LabCaltech

research & production networks

Multi-Gbps50-200ms delay

Experiment

155Mb/s

slowstart

equilibrium

FASTrecovery

FASTretransmit

timeout

10Gb/s

Implementation

Students Choe (Postech/CIT) Hu (Williams) J. Wang (CDS) Z.Wang (UCLA) Wei (CS)

Industry Doraiswami (Cisco) Yip (Cisco)

Faculty Doyle (CDS,EE,BE) Low (CS,EE) Newman (Physics) Paganini (UCLA)

Staff/Postdoc Bunn (CACR) Jin (CS) Ravot (Physics) Singh (CACR)

Partners CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco

People

Page 3: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Outline

Motivation Theory

TCP/AQM TCP/IP

Experimental results

Page 4: SCinet Caltech-SLAC experiments

netlab.caltech.edu

HEP high speed network

… that must change

Page 5: SCinet Caltech-SLAC experiments

netlab.caltech.edu

HEP Network (DataTAG)

NLNLSURFnet

GENEVA

UKUKSuperJANET4

ABILENE

ABILENE

ESNETESNET

CALREN

CALREN

ItItGARR-B

GEANT

NewYork

FrFrRenater

STAR-TAP

STARLIGHT

Wave

Triangle

2.5 Gbps Wavelength Triangle 2002 10 Gbps Triangle in 2003

Newman (Caltech)

Page 6: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Network upgrade 2001-06

’01155

’02622

’032.5

’04 5

’05 10

Page 7: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Projected performance

Ns-2: capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps100 sources, 100 ms round trip propagation delay

’01155

’02622

’032.5

’04 5

’05 10

J. Wang (Caltech)

Page 8: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Projected performance

Ns-2: capacity = 10Gbps100 sources, 100 ms round trip propagation delay

FAST TCP/RED

J. Wang (Caltech)

Page 9: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Outline

Motivation Theory

TCP/AQM TCP/IP

Experimental results

Page 10: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Congestion control

xi(t)

pl(t)

Example congestion measure pl(t) Loss (Reno) Queueing delay (Vegas)

Page 11: SCinet Caltech-SLAC experiments

netlab.caltech.edu

TCP/AQM

Congestion control is a distributed asynchronous algorithm to share bandwidth

It has two components TCP: adapts sending rate (window) to congestion AQM: adjusts & feeds back congestion information

They form a distributed feedback control system Equilibrium & stability depends on both TCP and AQM And on delay, capacity, routing, #connections

pl(t)

xi(t)TCP: Reno Vegas

AQM: DropTail RED REM/PI AVQ

Page 12: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Network model

F1

FN

G1

GL

Rf(s)

Rb’(s)

TCP Network AQM

x y

q p

lieR lis

lif link uses source if

lieR lislib link uses source if H

Page 13: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Vegas model

F1

FN

G1

GL

Rf(s)

Rb’(s)

TCP Network AQM

x y

q p

1)(

l

ll c

tyG

ii

ii

dtqtx

i tTF

)()(

21sgn

)(

1

Page 14: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Methodology

Protocol (Reno, Vegas, RED, REM/PI…)

Equilibrium Performance

Throughput, loss, delay

Fairness Utility

Dynamics Local stability Cost of stabilization

))( ),(( )1(

))( ),(( )1(

txtpGtp

txtpFtx

Page 15: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Summary: duality model

cRx

xUs

ssxs

subject to

)( max0

Flow control problem

TCP/AQM Maximize utility with different utility functions

Primal-dual algorithm

))( ),(( )1(

))( ),(( )1(

txtpGtp

txtpFtx

Reno,

VegasDropTail, RED, REM

Theorem (Low 00): (x*,p*) primal-dual optimal iff 0 ifequality with ** lll pcy

Page 16: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Equilibrium of VegasNetwork

Link queueing delays: pl

Queue length: clpl

Sources

Throughput: xi

E2E queueing delay : qi

Packets buffered:

Utility funtion: Ui(x) = i di log x Proportional fairness

iiii dqx

Page 17: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Persistent congestion

Vegas exploits buffer process to compute prices (queueing delays)

Persistent congestion due to Coupling of buffer & price Error in propagation delay estimation

Consequences Excessive backlog Unfairness to older sources

Theorem (Low, Peterson, Wang ’02)

A relative error of i in propagation delay estimation distorts the utility function to

iiiiiiiii xdxdxU log)1()(ˆ

Page 18: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Validation (L. Wang, Princeton)

Source rates (pkts/ms)# src1 src2 src3 src4 src51 5.98 (6) 2 2.05 (2) 3.92 (4)3 0.96 (0.94) 1.46 (1.49) 3.54 (3.57)4 0.51 (0.50) 0.72 (0.73) 1.34 (1.35) 3.38 (3.39)5 0.29 (0.29) 0.40 (0.40) 0.68 (0.67) 1.30 (1.30) 3.28

(3.34)

# queue (pkts) baseRTT (ms)1 19.8 (20) 10.18 (10.18)2 59.0 (60) 13.36 (13.51)3 127.3 (127) 20.17 (20.28)4 237.5 (238) 31.50 (31.50)5 416.3 (416) 49.86 (49.80)

Page 19: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Methodology

Protocol (Reno, Vegas, RED, REM/PI…)

Equilibrium Performance

Throughput, loss, delay

Fairness Utility

Dynamics Local stability Cost of stabilization

))( ),(( )1(

))( ),(( )1(

txtpGtp

txtpFtx

Page 20: SCinet Caltech-SLAC experiments

netlab.caltech.edu

TCP/RED stability

Small effect on queue AIMD Mice traffic Heterogeneity

Big effect on queue Stability!

Page 21: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Stable: 20ms delay

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

10

20

30

40

50

60

70Window

time (ms)

Win

dow

(pk

ts)

individual window

Window

Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking

Page 22: SCinet Caltech-SLAC experiments

netlab.caltech.edu

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

100

200

300

400

500

600

700

800Instantaneous queue

time (ms)

Inst

anta

neou

s qu

eue

(pkt

s)

Queue

Stable: 20ms delay

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

10

20

30

40

50

60

70Window

time (ms)

Win

dow

(pk

ts)

individual window

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

10

20

30

40

50

60

70Window

time (ms)

Win

dow

(pk

ts)

individual window

average window

Window

Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking

Page 23: SCinet Caltech-SLAC experiments

netlab.caltech.edu

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

10

20

30

40

50

60

70Window

time (10ms)

Win

dow

(pk

ts)

individual window

Unstable: 200ms delay

Window

Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking

Page 24: SCinet Caltech-SLAC experiments

netlab.caltech.edu

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

10

20

30

40

50

60

70Window

time (10ms)

Win

dow

(pk

ts)

individual window

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

10

20

30

40

50

60

70Window

time (10ms)

Win

dow

(pk

ts)

individual window

average window

Unstable: 200ms delay

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

100

200

300

400

500

600

700

800Instantaneous queue

time (10ms)

Inst

anta

neou

s qu

eue

(pkt

s)

QueueWindow

Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking

Page 25: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Other effects on queue

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

100

200

300

400

500

600

700

800Instantaneous queue

time (ms)

Inst

anta

neou

s qu

eue

(pkt

s)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

100

200

300

400

500

600

700

800Instantaneous queue

time (10ms)

Inst

anta

neou

s qu

eue

(pkt

s)

20ms

200ms

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

500

600

700

800Instantaneous queue (50% noise)

time (sec)

inst

anta

neou

s qu

eue

(pkt

s)

30% noise

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

500

600

700

800Instantaneous queue (50% noise)

time (sec)

inst

anta

neou

s qu

eue

(pkt

s)

30% noise

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

500

600

700

800

time (sec)

Instantaneous queue (pkts)

inst

anta

neou

s qu

eue

(pkt

s)

avg delay 16ms

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

500

600

700

800

time (sec)

Instantaneous queue (pkts)

inst

anta

neou

s qu

eue

(pkt

s)

avg delay 208ms

Page 26: SCinet Caltech-SLAC experiments

netlab.caltech.edu

222

2

3

33

)1(4

)1 )(

2

-(Nc

N

c

Theorem (Low et al, Infocom’02) Reno/RED is stable if

Stability: Reno/RED

F1

FN

G1

GL

Rf(s)

Rb’(s)

TCP Network AQM

x y

q p

TCP: Small Small c Large N

RED: Small Large delay

Page 27: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Stability: scalable control

F1

FN

G1

GL

Rf(s)

Rb’(s)

TCP Network AQM

x y

q p

lll

l ctyc

tp )(1

)()(

)(tq

mii

iii

i

extx

Theorem (Paganini, Doyle, Low, CDC’01) Provided R is full rank, feedback loop is locally stable for arbitrary delay, capacity, load and topology

Page 28: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Stability: Vegas

ii

ii

dtqtx

i tTx

)()(

21sgn

)(

1

F1

FN

G1

GL

Rf(s)

Rb’(s)

TCP Network AQM

x y

q p

lll

l ctyc

tp )(1

)(

Theorem (Choe & Low, Infocom’03) Provided R is full rank, feedback loop is locally stable if

), ;( max 20 kMTx ii

Page 29: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Stability: Stabilized Vegas

)()(1)( tan)(

1 )()(1-

2tqtt

tTx iid

tqtxi ii

ii

F1

FN

G1

GL

Rf(s)

Rb’(s)

TCP Network AQM

x y

q p

lll

l ctyc

tp )(1

)(

Theorem (Choe & Low, Infocom’03) Provided R is full rank, feedback loop is locally stable if

),( max aTx ii

Page 30: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Stability: Stabilized Vegas

)()(1)( tan)(

1 )()(1-

2tqtt

tTx iid

tqtxi ii

ii

F1

FN

G1

GL

Rf(s)

Rb’(s)

TCP Network AQM

x y

q p

lll

l ctyc

tp )(1

)(

Application Stabilized TCP with current routers Queueing delay as congestion measure has right scaling Incremental deployment with ECN

Page 31: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Fast AQM Scalable TCP

Equilibrium properties Uses end-to-end delay and loss Achieves any desired fairness, expressed by utility function Very high utilization (99% in theory)

Stability properties Stability for arbitrary delay, capacity, routing & load Robust to heterogeneity, evolution, … Good performance

Negligible queueing delay & loss (with ECN) Fast response

Page 32: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Implementation

Sender-side kernel modification Build on

Reno, NewReno, SACK, Vegas New insights

Difficulties due to Effects ignored in theory Large window size

First demonstration in SuperComputing Conf, Nov 2002 Developers: Cheng Jin & David Wei FAST Team & Partners

Page 33: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Outline

Motivation Theory

TCP/AQM TCP/IP

Experimental results WAN in Lab

Page 34: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Network

(Sylvain Ravot, caltech/CERN)

Page 35: SCinet Caltech-SLAC experiments

netlab.caltech.edu

FAST BMPS

Internet2Land Speed

Record

FAST

1 2

1

2

7

9

10

Gen

eva-

Sunn

yval

e

Baltim

ore-S

unnyvale

#flows

FAST Standard MTU Throughput averaged over > 1hr

Page 36: SCinet Caltech-SLAC experiments

netlab.caltech.edu

FAST BMPS

flows BmpsPeta

ThruputMbps

Distancekm

Delayms

MTUB

Durations

TransferGB

Path

Alaska-Amsterdam

9.4.2002

1 4.92 401 12,272 - - 13 0.625 Fairbanks, AL – Amsterdam,

NL

MS-ISI29.3.2000

2 5.38 957 5,626 - 4,470 82 8.4 MS, WA – ISI, Va

Caltech-SLAC19.11.2002

1 9.28 925 10,037 180 1,500 3,600 387 CERN -Sunnyvale

Caltech-SLAC19.11.2002

2 18.03 1,797 10,037 180 1,500 3,600 753 CERN -Sunnyvale

Caltech-SLAC18.11.2002

7 24.17 6,123 3,948 85 1,500 21,600 15,396 Baltimore -Sunnyvale

Caltech-SLAC19.11.2002

9 31.35 7,940 3,948 85 1,500 4,030 3,725 Baltimore -Sunnyvale

Caltech-SLAC20.11.2002

10 33.99 8,609 3,948 85 1,500 21,600 21,647 Baltimore -Sunnyvale

Mbps = 106 b/s; GB = 230 bytes

Page 37: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Aggregate throughput

1 flow 2 flows 7 flows 9 flows 10 flows

Average utilization

95%

92%

90%

90%

88%FAST Standard MTU Utilization averaged over > 1hr

1hr 1hr 6hr 1.1hr 6hr

Page 38: SCinet Caltech-SLAC experiments

SCinet Caltech-SLAC experiments

netlab.caltech.edu/FAST

SC2002 Baltimore, Nov 2002

Experiment

Sunnyvale Baltimore

Chicago

Geneva

3000km 1000km

70

00

km

C. Jin, D. Wei, S. LowFAST Team and Partners

Internet: distributed feedbacksystem Rf (s)

Rb’(s)

x

p

TCP AQM

Theory

FAST TCP Standard MTU Peak window = 14,255 pkts Throughput averaged over > 1hr 925 Mbps single flow/GE card

9.28 petabit-meter/sec 1.89 times LSR

8.6 Gbps with 10 flows 34.0 petabit-meter/sec 6.32 times LSR

21TB in 6 hours with 10 flows

Implementation Sender-side modification Delay based

Highlights

1 2

1

2

7

9

10G

enev

a-Sunnyv

ale

Baltim

ore-

Sunn

yval

eFA

ST

I2 L

SR

#flows

Page 39: SCinet Caltech-SLAC experiments

netlab.caltech.edu

FAST vs Linux TCP

flows BmpsPeta

ThruputMbps

Distancekm

Delayms

MTUB

Durations

TransferGB

Path

Linux TCPtxqueulen=100

1 1.86 185 10,037 180 1,500 3600 78 CERN - Sunnyvale

Linux TCPtxqueulen=10000

1 2.67 266 10,037 180 1,500 3600 111 CERN - Sunnyvale

FAST19.11.2002

1 9.28 925 10,037 180 1,500 3600 387 CERN -Sunnyvale

Linux TCPtxqueulen=100

2 3.18 317 10,037 180 1,500 3600 133 CERN - Sunnyvale

Linux TCPtxqueulen=10000

2 9.35 931 10,037 180 1,500 3600 390 CERN - Sunnyvale

FAST19.11.2002

2 18.03 1,797 10,037 180 1,500 3600 753 CERN -Sunnyvale

Mbps = 106 b/s; GB = 230 bytes; Delay = propagation delayLinux TCP expts: Jan 28-29, 2003

Page 40: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Aggregate throughput

Linux TCP Linux TCP FAST

Average utilization

19%

27%

92%FAST Standard MTU Utilization averaged over 1hr

txq=100 txq=10000

95%

16%

48%

Linux TCP Linux TCP FAST

2G

1G

Page 41: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Effect of MTU

(Sylvain Ravot, Caltech/CERN)

Linux TCP

Page 42: SCinet Caltech-SLAC experiments

SCinet Caltech-SLAC experiments

netlab.caltech.edu/FAST

SC2002 Baltimore, Nov 2002

PrototypeC. Jin, D. Wei

TheoryD. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)

Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S.

Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J.

Navratil, J. Williams StarLight: T. deFanti, L. Winkler

Major sponsorsARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF

Acknowledgments

Page 43: SCinet Caltech-SLAC experiments

netlab.caltech.edu

FAST URL’s

FAST websitehttp://netlab.caltech.edu/FAST/

Cottrell’s SLAC websitehttp://www-iepm.slac.stanford.edu/monitoring/bulk/fast

Page 44: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Outline

Motivation Theory

TCP/AQM TCP/IP Non-adaptive sources Content distribution

Implementation WAN in Lab

Page 45: SCinet Caltech-SLAC experiments

1

20

1

20

fiber spool

OPM

Max path length = 10,000 kmMax one-way delay = 50ms

S

S

S

S

R

R

H

R

: server

: router

electroniccrossconnect(Cisco 15454)

S

S

S

S

R

R

EDFA EDFA

500 km

Page 46: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Unique capabilities

WAN in Lab Capacity: 2.5 – 10 Gbps Delay: 0 – 100 ms round trip

Configurable & evolvable Topology, rate, delays, routing Always at cutting edge

Risky research MPLS, AQM, routing, …

Integral part of R&A networks Transition from theory, implementation,

demonstration, deployment Transition from lab to marketplace

Global resource

(a) Physical network

R1

R2

R10

1

3

20

2

18

19

1

3

20

2

4

19

Page 47: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Unique capabilities

WAN in Lab Capacity: 2.5 – 10 Gbps Delay: 0 – 100 ms round trip

Configurable & evolvable Topology, rate, delays, routing Always at cutting edge

Risky research MPLS, AQM, routing, …

Integral part of R&A networks Transition from theory, implementation,

demonstration, deployment Transition from lab to marketplace

Global resource

R1 R2

R3R10

(b) Logical network

1 23

419

20

Page 48: SCinet Caltech-SLAC experiments

netlab.caltech.edu

WAN in Lab Capacity: 2.5 – 10 Gbps Delay: 0 – 100 ms round trip

Configurable & evolvable Topology, rate, delays, routing Always at cutting edge

Risky research MPLS, AQM, routing, …

Integral part of R&A networks Transition from theory, implementation,

demonstration, deployment Transition from lab to marketplace

Global resource

Unique capabilities

Calren2/Abilene

Chicago

Amsterdam

CERN

Geneva

SURFNet

StarLight

WAN in LabCaltech

research & production networks

Multi-Gbps50-200ms delay

Experiment

Page 49: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Coming together …

Clear & presentNeed

Resources

Page 50: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Clear & presentNeed

Coming together …

Resources

Page 51: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Clear & presentNeed

Coming together …

Resources FASTProtocols

Page 52: SCinet Caltech-SLAC experiments

FAST Protocols for Ultrascale Networks

netlab.caltech.edu/FAST

Internet: distributed feedback control system TCP: adapts sending rate to congestion AQM: feeds back congestion information

Rf (s)

Rb’(s)

x

))((1

lll

l ctyc

p

)()(1)( tan)(

)()(1-2

tqtttT

wx iid

tqtxi

ii ii

ii

y

pq

TCP AQM

Theory

Calren2/Abilene

Chicago

Amsterdam

CERN

Geneva

SURFNet

StarLight

WAN in LabCaltech

research & production networks

Multi-Gbps50-200ms delay

Experiment

155Mb/s

slowstart

equilibrium

FASTrecovery

FASTretransmit

timeout

10Gb/s

Implementation

Students Choe (Postech/CIT) Hu (Williams) J. Wang (CDS) Z.Wang (UCLA) Wei (CS)

Industry Doraiswami (Cisco) Yip (Cisco)

Faculty Doyle (CDS,EE,BE) Low (CS,EE) Newman (Physics) Paganini (UCLA)

Staff/Postdoc Bunn (CACR) Jin (CS) Ravot (Physics) Singh (CACR)

Partners CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco

People

Page 53: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Backup slides

Page 54: SCinet Caltech-SLAC experiments

netlab.caltech.edu

TCP Congestion States

Established

Slow Start

High Throughput

ack for syn/ack cwnd > ssthreshpacing? gamma?

Page 55: SCinet Caltech-SLAC experiments

netlab.caltech.edu

From Slow Start to High Throughput

Linux TCP handshake differs from the TCP specification

Is 64 KB too small for ssthresh? 1 Gbps x 100 ms = 12.5 MB !

What about pacing? Gamma parameter in Vegas

Page 56: SCinet Caltech-SLAC experiments

netlab.caltech.edu

TCP Congestion States

Established

Slow Start

High Throughput

FAST’sRetransmitTime-out *

3 dup acks

retransmision timer fired

Page 57: SCinet Caltech-SLAC experiments

netlab.caltech.edu

High Throughput

Update cwnd as follows: +1 pkts in queue < + kq’ - 1 otherwise

Packet reordering may be frequent Disabling delayed ack can generate

many dup acks Is THREE the right number for Gbps?

Page 58: SCinet Caltech-SLAC experiments

netlab.caltech.edu

TCP Congestion States

Established

Slow Start

High Throughput

FAST’sRecovery

FAST’sRetransmit

3 dup acks

retransmit packetrecord snd_nxt

reduce cwnd/ssthresh

snd_una > recorded snd_nxt

send packet if in_flight < cwnd

Page 59: SCinet Caltech-SLAC experiments

netlab.caltech.edu

When Loss Happens

Reduce cwnd/ssthresh only when loss is due to congestion

Maintain in_flight and send data when in_flight < cwnd

Do FAST’s Recovery until snd_una >= recorded snd_nxt

Page 60: SCinet Caltech-SLAC experiments

netlab.caltech.edu

TCP Congestion States

Established

Slow Start

High Throughput

FAST’sRecovery

FAST’sRetransmitTime-out *

3 dup acks

retransmit packetrecord snd_nxt

reduce cwnd/ssthresh

retransmision timer fired

Page 61: SCinet Caltech-SLAC experiments

netlab.caltech.edu

When Time-out Happens

Very bad for throughput Mark all unacknowledged pkts as lost and

do slow start Dup acks cause false retransmits since

receiver’s state is unknown Floyd has a “fix” (RFC 2582).

Page 62: SCinet Caltech-SLAC experiments

netlab.caltech.edu

TCP Congestion States

Established

Slow Start

High Throughput

FAST’sRecovery

FAST’s RetransmitTime-out *

ack for syn/ackcwnd > ssthresh

3 dup acks

retransmit packetrecord snd_nxt

reduce cwnd/ssthresh

snd_una > recorded snd_nxt

retransmision timer fired

Page 63: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Individual Packet States

Birth Sending In Flight Received

Queued Dropped Buffered

Freed Delivered

queueing

out of order queueand no memoryack’d

Page 64: SCinet Caltech-SLAC experiments

SCinet Bandwidth Challenge

netlab.caltech.edu/FAST

SC2002 Baltimore, Nov 2002

Experiment

Sunnyvale Baltimore

Chicago

Geneva

3000km 1000km

70

00

km

C. Jin, D. Wei, S. LowFAST Team and Partners

Internet: distributed feedbacksystem Rf (s)

Rb’(s)

x

p

TCP AQM

Theory

22.8.02IPv6

9.4.021 flow

29.3.00multiple

Balt

imore

-Geneva

Baltim

ore-

Sunn

yval

eSC20021 flow

SC20022 flows

SC200210 flows

I2 LSR

Sunnyvale

-Geneva

FAST TCP Standard MTU Peak window = 14,100 pkts 940 Mbps single flow/GE card

9.4 petabit-meter/sec 1.9 times LSR

9.4 Gbps with 10 flows 37.0 petabit-meter/sec 6.9 times LSR

16TB in 6 hours with 7 flows

Implementation Sender-side modification Delay based Stabilized Vegas

Highlights

Page 65: SCinet Caltech-SLAC experiments

netlab.caltech.edu

Baltim

ore-

Sunn

yval

e

Sun

nyv

ale-

Gen

eva

29.3.2000multiple

22.8.2002IPv6

9.4.20021 flow

SC2002 1 flow

SC200210 flows

FAST BMPS

I2 L

SR

Bmps Thruput Duration

37.0 9.40 Gbps min

9.42 940 Mbps 19 min

5.38 1.02 Gbps 82 sec

4.93 402 Mbps 13 sec

0.03 8 Mbps 60 min

FA

ST

Page 66: SCinet Caltech-SLAC experiments

netlab.caltech.edu

FAST: 7 flows

Statistics Data: 2.857 TB Distance: 3,936 km Delay: 85 msAverage Duration: 60 mins Thruput: 6.35 Gbps Bmps: 24.99 petab-m/sPeak Duration: 3.0 mins Thruput: 6.58 Gbps Bmps: 25.90 petab-m/s

Network SC2002 (Baltimore) SLAC (Sunnyvale), GE , Standard MTU

18 Nov 2002 Mon

cwnd = 6,658 pkts per flow

17 Nov 2002 Sun

Page 67: SCinet Caltech-SLAC experiments

netlab.caltech.edu

FAST: single flow

Statistics Data: 273 GB Distance: 10,025 km Delay: 180 msAverage Duration: 43 mins Thruput: 847 Mbps Bmps: 8.49 petab-m/sPeak Duration: 19.2 mins Thruput: 940 Mbps Bmps: 9.42 petab-m/s

Network CERN (Geneva) SLAC (Sunnyvale), GE, Standard MTU

17 Nov 2002 Sun

cwnd = 14,100 pkts

Page 68: SCinet Caltech-SLAC experiments

SCinet Bandwidth Challenge

netlab.caltech.edu/FAST

SC2002 Baltimore, Nov 2002

PrototypeC. Jin, D. Wei

TheoryD. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)

Experiment/facilities Caltech: J. Bunn, S. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J.

Pool, S. Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, L. Cottrell, C. Logg, W. Matthews, R. Mount, J. Navratil StarLight: T. deFanti, L. Winkler

Major sponsors/partnersARO, CACR, Cisco, DataTAG, DoE, Lee Center, Level3, NSF

Acknowledgments