fast tcp cheng jin david wei steven low netlab.caltech.edu

60
FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Upload: monica-campbell

Post on 16-Jan-2016

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

FAST TCP

Cheng JinDavid Wei

Steven Low

netlab.CALTECH.edu

Page 2: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Acknowledgments Caltech

Bunn, Choe, Doyle, Hegde, Jayaraman, Newman, Ravot, Singh, X. Su, J. Wang, Xia

UCLA Paganini, Z. Wang

CERN Martin

SLAC Cottrell

Internet2 Almes, Shalunov

MIT Haystack Observatory Lapsley, Whitney

TeraGrid Linda Winkler

Cisco Aiken, Doraiswami, McGugan, Yip

Level(3) Fernes

LANL Wu

Page 3: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Outline

Motivation & approach FAST architecture Window control algorithm Experimental evaluation

skip: theoretical foundation

Page 4: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Congestion control

xi(t)

pl(t)

Example congestion measure pl(t) Loss (Reno) Queueing delay (Vegas)

Page 5: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

TCP/AQM

Congestion control is a distributed asynchronous algorithm to share bandwidth

It has two components TCP: adapts sending rate (window) to congestion AQM: adjusts & feeds back congestion information

They form a distributed feedback control system Equilibrium & stability depends on both TCP and AQM And on delay, capacity, routing, #connections

pl(t)

xi(t)TCP: Reno Vegas

AQM: DropTail RED REM/PI AVQ

Page 6: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Difficulties at large window

Equilibrium problem Packet level: AI too slow, MD too drastic Flow level: required loss probability too

small Dynamic problem

Packet level: must oscillate on binary signal

Flow level: unstable at large window

5

Page 7: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Packet & flow level

ACK: W W + 1/W

Loss: W W – 0.5W

Packet level

Reno TCP

Flow level

Equilibrium

Dynamics

pkts (Mathis formula)

Page 8: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Reno TCP

Packet level Designed and implemented first

Flow level Understood afterwards

Flow level dynamics determines Equilibrium: performance, fairness Stability

Design flow level equilibrium & stability Implement flow level goals at packet level

Page 9: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Reno TCP

Packet level Designed and implemented first

Flow level Understood afterwards

Flow level dynamics determines Equilibrium: performance, fairness Stability

Packet level design of FAST, HSTCP, STCP guided by flow level properties

Page 10: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Packet level

ACK: W W + 1/W

Loss: W W – 0.5W

Reno AIMD(1, 0.5)

ACK: W W + a(w)/W

Loss: W W – b(w)W

HSTCP AIMD(a(w), b(w))

ACK: W W + 0.01

Loss: W W – 0.125W

STCP MIMD(a, b)

RTT

baseRTT W W :RTT FAST

Page 11: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Flow level: Reno, HSTCP, STCP, FAST

Similar flow level equilibrium

= 1.225 (Reno), 0.120 (HSTCP), 0.075 (STCP)

pkts/sec (Mathis formula)

Page 12: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Flow level: Reno, HSTCP, STCP, FAST

Different gain and utility Ui

They determine equilibrium and stability

Different congestion measure pi Loss probability (Reno, HSTCP, STCP) Queueing delay (Vegas, FAST)

Common flow level dynamics!

windowadjustment

controlgain

flow levelgoal=

Page 13: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Implementation strategy

Common flow level dynamics

windowadjustment

controlgain

flow levelgoal=

Small adjustment when close, large far away Need to estimate how far current state is wrt target Scalable

Window adjustment independent of pi Depends only on current window Difficult to scale

Page 14: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Outline

Motivation & approach FAST architecture Window control algorithm Experimental evaluation

skip: theoretical foundation

Page 15: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Architecture

RTT timescaleLoss recovery

<RTT timescale

Page 16: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Architecture

Each component designed independently upgraded asynchronously

Page 17: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Architecture

Each component designed independently upgraded asynchronously

WindowControl

Page 18: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Uses delay as congestion measure Delay provides finer congestion info Dealy scales correctly with network capacity Can operate with low queuing delay

FAST-TCP basic idea

Loss

C Window

Que

ue D

elay

FASTLoss Based TCP

Page 19: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Window control algorithm

Full utilization regardless of bandwidth-delay product

Globally stable exponential convergence

Fairness weighted proportional fairness parameter

Page 20: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Outline

Motivation & approach FAST architecture Window control algorithm Experimental evaluation

Abilene-HENP network Haystack Observatory DummyNet

Page 21: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Abilene Test

OC48

OC192

(Yang Xia, Harvey Newman, Caltech)

Periodic lossesevery 10mins

Page 22: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

(Yang Xia, Harvey Newman, Caltech)

Periodic lossesevery 10mins

Page 23: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

(Yang Xia, Harvey Newman, Caltech)

Periodic lossesevery 10mins

FAST backs off tomake room for Reno

Page 24: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Haystack Experiments

Lapsley, MIT Haystack

Page 25: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Haystack - 1 Flow (Atlanta-> Japan)

• Iperf used to generate traffic.• Sender is a Xeon 2.6 Ghz• Window was constant:Burstiness in rate due to Host processing and ack spacing.

Lapsley, MIT Haystack

Page 26: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Haystack – 2 Flows from 1 machine (Atlanta -> Japan)

Lapsley, MIT Haystack

Page 27: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Timeout

All outstanding packets marked as lost.1. SACKs reduce lost packets

2. Lost packets retransmitted slowlyas cwnd is capped at 1 (bug).

Linux Loss Recovery

Page 28: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

DummyNet Experiments

Experiments using emulated network. 800 Mbps emulated bottleneck in

DummyNet.

Sender PC

Dual Xeon 2.6Ghz 2Gb

Intel GbE

Linux 2.4.22

DummyNet PC

Dual Xeon 3.06Ghz 2Gb

FreeBSD 5.1

800Mbps

Receiver PC

Dual Xeon 2.6Ghz 2Gb

Intel GbE

Linux 2.4.22

Page 29: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Dynamic sharing: 3 flowsFAST Linux

Dynamic sharing on Dummynet capacity = 800Mbps delay=120ms 3 flows iperf throughput Linux 2.4.x (HSTCP: UCL)

Page 30: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Dynamic sharing: 3 flowsFAST Linux

HSTCPBIC

Steady throughput

Page 31: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

FAST Linux

throughput

loss

queue

STCPHSTCP

Dynamic sharing on Dummynet capacity = 800Mbps delay=120ms 14 flows iperf throughput Linux 2.4.x (HSTCP: UCL)

30min

Page 32: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

FAST Linux

throughput

loss

queue

HSTCP

30min

Room for mice !

HSTCP BIC

Page 33: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Average Queue vs Buffer Size

Dummynet capacity

= 800Mbps Delay

=200ms 1 flows Buffer size:

50, …, 8000 pkts

(S. Hedge, B. Wydrowski, etc, Caltech)

Page 34: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Is large queue necessary for high throughput?

Page 35: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

FAST TCP: motivation, architecture, algorithms, performance. IEEE Infocom March 2004

-release: April 2004Source freely available for any non-profit use

netlab.caltech.edu/FAST

Page 36: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Aggregate throughput

ideal performance

Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Page 37: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Aggregate throughput

small window800pkts

largewindow

8000

Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Page 38: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Fairness

Jain’s index

HST

CP ~

Ren

oDummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Page 39: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Stability

Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

stable indiverse

scenarios

Page 40: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

FAST TCP: motivation, architecture, algorithms, performance. IEEE Infocom March 2004

-release: April 2004Source freely available for any non-profit use

netlab.caltech.edu/FAST

Page 41: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

BACKUP Slides

Page 42: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

IP Rights

Caltech owns IP rights applicable more broadly than TCP leave all options open

IP freely available if FAST TCP becomes IETF standard Code available on FAST website for any non-commercial use

Page 43: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

WAN in Lab

Caltech: John Doyle, Raj Jayaraman, George Lee, Steven Low (PI), Harvey Newman, Demetri Psaltis, Xun Su, Yang Xia

Cisco: Bob Aiken, Vijay Doraiswami, Chris McGugan, Steven Yip

netlab.caltech.edu

NSF

Page 44: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Key Personnel Steven Low, CS/EE Harvey Newman,

Physics John Doyle, EE/CDS Demetri Psaltis, EE

Cisco Bob Aiken Vijay Doraiswami Chris McGugan Steven Yip

Raj Jayaraman, CS Xun Su, Physics Yang Xia, Physics George Lee, CS

2 grad students 3 summer students Cisco engineers

Page 45: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Spectrum of toolslog(cost)

log(abstraction)mathsimulationemulationlive nk WANiLab

NSSSFNetQualNetJavaSim

Mathis formulaOptimizationControl theoryNonlinear modelStocahstic model

DummyNetEmuLabModelNetWAIL

PlanetLabAbileneNLRDataTAGCENICWAILetc

?

…we use them all

Page 46: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Spectrum of tools

mathsimulationemulationlive nk WANiLab

Distance High High High

Speed High High Low

Realism High High Low

Traffic High Low Low

Configurable Low Medium High

Monitoring Low Medium High

Cost High Medium Low

Critical in developmente.g. Web100

Page 47: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Goal

State-of-the-art hybrid WAN High speed, large distance

2.5G 10G 50 – 200ms

Wireless devices connected by optical core

Controlled & repeatable experiments Reconfigurable & evolvable Built in monitoring capability

Page 48: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

WAN in Lab

5-year plan 6 Cisco

ONS15454 4 routers 10s servers Wireless

devices 800km fiber ~100ms

RTT

OSPF Area: 40OSPF Area: 20

OSPF Area: 10 OSPF Area:30

OPTICAL NETWORK

ONS15454Site B

ONS15454Site E

ONS15454Site C

ONS15454Site D

CISCO7613

(Bottleneck Rtr)

ML-Series NeworkModule

ML-Series NeworkModule

ML-Series networkmodule

CISCO7613

(Bottleneck Rtr)ML-Series Nework

Module

ONS15454Site A

ONS15454Site F

10GE : 100KM

10GE: 100km

Server ServerServer Server

Server Server

CISCO7613

(Bottleneck Rtr)

Server Server Server ServerServer Server Server Server

Linux Farm

Server

Server

Server

Server Server Server ServerServer Server

CISCO7613

(Bottleneck Rtr)

Server Server ServerServer

192.168.10/24 192.168.30/24

10.0.2/24

ITANIUM -10GE Server

10.0.3/24

WirelessComponents

WirelessComponents

Itanium -10GE Server

10.0.3/24

Linux Farm

Server

Server

Server

Linux FarmServer

ServerServer

Linux FarmServer

ServerServerWireless

ComponentsWireless

Components

ITANIUM10GE Server

10.0.3/24

10.0.2/24

10.0.2/24 10.0.2/24

192.168.20/24

ITANIUM10GE Server

10.0.3/24

192.168.40/24

10.0.1/24

10.0.5/2410.0.1/24

10.0.4/24

10.0.4/24

10.0.5/24

V. Doraiswami (Cisco)R. Jayaraman (Caltech)

Page 49: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

OSPF Area: 20

OSPF Area: 10

OPTICAL NETWORK

ONS15454Site B

ONS15454Site D

CISCO7613

(Bottleneck Rtr)

ONS15454 (to support

additionalML-Series cards)

ONS15454 (to support

additionalML-Series cards)

ONS15454Site A

10

GE

: 10

0K

M

Server ServerServer Server Server Server

Server Server

CISCO7613

(Bottleneck Rtr)

Server Server ServerServer

192.168.10/24

10.0.2/24

ITANIUM -10GE Server

10.0.2/24

WirelessComponents

Itanium -10GE Server

10.0.2/24

WirelessComponents

10.0.2/24

192.168.20/24

10.0.1/24

10.0.1/24

WAN in Lab

Year-1 plan 3 Cisco ONS

15454 2 routers 10s servers Wireless

devices

V. Doraiswami (Cisco)R. Jayaraman (Caltech)

Page 50: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Hybrid NetworkScenarios: Ad hoc network Cellular network Sensor network

How optical core supports wireless

edges?

X. Su (Caltech)

Page 51: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Experiments Transport & network layer

TCP, AQM, TCP/IP interaction

Wireless hybrid networking Wireless media delivery Fixed wireless access Sensor networks

Optical control plane Grid computing

UltraLight

Page 52: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

WAN in Lab Capacity: 2.5 – 10 Gbps Delay: 0 – 100 ms round trip Delay: 0 – 400 ms round trip

Configurable & evolvable Topology, rate, delays, routing Always at cutting edge

Flexible, active debugging Passive monitoring, AQM

Integral part of R&A networks Transition from theory, implementation,

demonstration, deployment Transition from lab to marketplace

Global resource Part of global infrastructure UltraLight led by

Newman

Unique capabilities

Calren2/Abilene

Chicago

Amsterdam

CERN

Geneva

SURFNet

StarLight

WAN in LabCaltech

research & production networks

Multi-Gbps50-200ms delay

Experiment

Page 53: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Network debugging

Performance problems in real network Simulation will miss Emulation might miss Live network hard to debug

WAN in Lab Passive monitoring inside network Active debugging possible

Page 54: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Passive monitoring

Fibersplitter

DAG

RAID

TimestampHeader

GPS

Monitor

No overhead on system Can capture full info at OC48

UofWaikato’s DAG card captures at OC48 speed

Can filter if necessary Disk speed = 2.5Gbps*40/1500

= 66Mbps Monitors synchronized by GPS

or cheaper alternatives Data stored for offline

analysis

D. Wei (Caltech)

Page 55: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Passive monitoring

D. Wei (Caltech)

Fibersplitter

DAG

RAID

TimestampHeader

GPS

Monitor

Server

Server

router

router

monitor

monitor

monitor monitor

monitor

monitor

Web100, MonALISA

Page 56: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

UltraLight testbed

UltraLight team (Newman)

Page 57: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Status Hardware

Optical transport design: finalized IP infrastructure design: finalized (almost) Wireless infrastructure design: finalized Price negotiation/ordering/delivery: summer 04

Software Passive monitoring: summer student Management software: 2005 -

Physical lab Renovation: to be completed by summer 04

Page 58: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

2007200620052003 2004

hardwaredesign

physical building

fundraising

NSF funds10/03

Status

usabletestbed12/04

monitoring

trafficgeneration

connectedUltraLight

usefultestbed12/05

AROfunds5/04

expansion

support

management

OSPF Area: 40OSPF Area: 20

OSPF Area: 10 OSPF Area:30

OPTICAL NETWORK

ONS15454Site B

ONS15454Site E

ONS15454Site C

ONS15454Site D

CISCO7613

(Bottleneck Rtr)

ML-Series NeworkModule

ML-Series NeworkModule

ML-Series networkmodule

CISCO7613

(Bottleneck Rtr)ML-Series Nework

Module

ONS15454Site A

ONS15454Site F

10GE : 100KM

10GE: 100km

Server ServerServer Server

Server Server

CISCO7613

(Bottleneck Rtr)

Server Server Server ServerServer Server Server Server

Linux Farm

Server

Server

Server

Server Server Server ServerServer Server

CISCO7613

(Bottleneck Rtr)

Server Server ServerServer

192.168.10/24 192.168.30/24

10.0.2/24

ITANIUM -10GE Server

10.0.3/24

WirelessComponents

WirelessComponents

Itanium -10GE Server

10.0.3/24

Linux Farm

Server

Server

Server

Linux FarmServer

ServerServer

Linux FarmServer

ServerServerWireless

ComponentsWireless

Components

ITANIUM10GE Server

10.0.3/24

10.0.2/24

10.0.2/24 10.0.2/24

192.168.20/24

ITANIUM10GE Server

10.0.3/24

192.168.40/24

10.0.1/24

10.0.5/2410.0.1/24

10.0.4/24

10.0.4/24

10.0.5/24

Page 59: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

CS DeptJorgensen Lab

NetLab

WANin Lab

G. Lee, R. Jayaraman, E. Nixon (Caltech)

Page 60: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Summary Testbed driven by research agenda

Rich and strong networking effort Integrated approach:

theory + implementation + experiments “A network that can break”

Integral part of real testbeds Part of global infrastructure UltraLight led by

Harvey Newman (Caltech) Integrated monitoring & measurement

facility Fiber splitter passive monitors MonALISA