web100/net100 at oak ridge national lab tom dunigan [email protected] august 1, 2002

14
Web100/Net100 at Oak Ridge National Lab Tom Dunigan [email protected] August 1, 2002

Upload: blaise-weaver

Post on 20-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

Web100/Net100at

Oak Ridge National Lab

Tom Dunigan [email protected]

August 1, 2002

Page 2: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

Web100 at ORNL

• Funding and goals• Web100 tools and insights

– Java bandwidth server– instrumented probes and log daemon– trace daemons– my favorite Web100 variables

• TCP tuning with Web100– tuning daemon (WAD)– tuning buffer sizes, slow-start, AIMD/VMSS, delayed ACK, reordering, parallel

• Web100 needs

Page 3: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

Net100: developing network-aware operating systems• DOE-funded (Office of Science) project ($1M/yr, 3 yrs beginning 9/01)• Principal investigators

– Matt Mathis, PSC ([email protected])– Brian Tierney, LBNL ([email protected])– Tom Dunigan, ORNL ([email protected]) Florence Fowler Nagi Rao

• Objective: – measure and understand end-to-end network and application performance– tune network applications (grid and bulk transfer)– first year emphasis: bulk transfer over high delay/bandwidth nets

• Components (leverage Web100)– Network Tool Analysis Framework (NTAF)

• tool design and analysis• active network probes and passive sensors• network metrics data base

– transport protocol analysis– tuning daemon (WAD) to tune network flows based on network metrics

www.net100.org

Page 4: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

Web100 tools

• Java applet bandwidth/client tester– measure in/out data rates– report flow characteristics– Try it http://firebird.ccs.ornl.gov:7123– INSIGHTS:

• what happened, what you can expect• from server log:

– 25,755 flows– 53% with loss, 23% timeouts

• Post-transfer statistics– ttcp100/iperf100 – Web100 daemon

• avoid modifying applications• log designated paths/ports/variables

– INSIGHTS: later...

Page 5: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

Web100 tools

• Tracer daemon– collect Web100 variables at 0.1 second intervals– config file specifies

• source/port dest/port • web100 variables (current/delta)

– log to disk with timestamp and CID– C and python (LBL-based) – INSIGHTS:

• watch uninstrumented app’s (GridFTP)• analyze flow dynamics with plots (cwnd, ssthresh, re-

xmits,RTT…)• analyze tuned flows• aggregate parallel flow data

# traced config file#local lport remote rport0.0.0.0 0 124.55.182.7 00.0.0.0 0 134.67.45.9 0#v=value d=deltad PktsOutd PktsRetransv CurrentCwndv SampledRTT

Page 6: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

My favorite Web100 variables• Post-transfer

– CurrentMSS/Timeouts: PIX firewall problems– RetransThresh: out of order packets– MaxCwnd/MaxSsthresh: path capacity, linux 2.4 caching– MinRTT/MaxRTT/*RTO: queuing, bandwidth-delay– SendStall/OtherReductions: linux 2.4 slowups– MaxRwinRcvd/Sndbuf: buffer limits, web100 wscale clamp– CongestionSignals/PacketsRetrans: loss intensity– SndLimTime* : bottleneck

• Dynamic– CongestionSignals/PacketsRetrans/CurrentCwnd: type of loss, when (ss)– SampledRTT: queueing delays– CurrentSsthresh/Pktsout: recovery, timeouts– CurrentRwinRcvd: linux 2.4 window advertisement

Page 7: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

PIX SACK problem

Web100 reports timeouts into ORNL, not at other sites ??

Theory 1: yet another linux 2.4 TCP feature our TCP-over-UDP: no timeouts

Tcpdump/tcptrace/xplot of flow both inside and outside ORNL ? Tcptrace bug -- SACK blocks wrong for one of the dumps… NOT. ORNL PIX firewall randomizing TCP sequence numbers, but failed to adjust SACK blocks RESULT: TCP timeouts

Page 8: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

TCP tuning with Web100+/Net100• Path characterization (NTAF)

– both active and passive measurement– data base of measurement data– NTAF/Web100 hosts at PSC, NCAR,LBL,ORNL

• Application tuning (tuning daemon, WAD)– Web100 extensions

• disable Linux 2.4 caching/SendStall• event notification• more tuning options

– daemon tunes application at start up• static tuning information• query NTAF and calculate optimum TCP parameters

– dynamically tune application (Web100 feedback)• adjust parameters during flow• split optimum among parallel flows

• Transport protocol optimizations– what to tune?– is it fair? stable?

Page 9: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

Net100 TCP tuning

• TCP performance– reliable/stable/fair– need buffer = bandwidth*RTT

• ORNL/NERSC (80 ms, OC12) need 6 MB– TCP slow-start and loss recovery proportional to MSS/RTT

• slow on today’s high delay/bandwidth paths– TCP is lossy be design

• TCP tuning– set optimal (?) buffer size– avoid losses

• modified slow-start• reduce bursts• anticipate (Vegas?) loss• reorder threshold

– speed recovery• bigger MTU or “virtual MSS”• modified AIMD (0.5,1)• delayed ACKs and initial window

ns simulation: 500 mbs link, 80 ms RTTPacket loss early in slow start.Standard TCP with del ACK takes 10 minutes to recover!

Page 10: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

Net100 TCP tuning

• Work-around Daemon (WAD) – tune unknowing sender/receiver at startup and/or during

flow– Web100 kernel extensions

• uses netlink to alert daemon of socket open/close• Besides existing Web100 buffer tuning, new code and WAD_*

variables• knobs to disable Linux 2.4 caching and sendstall

– config file with static tuning data• mode specifies dynamic tuning (Floyd AIMD, NTAF buffer size, concurrent

streams)

– daemon periodically polls NTAF for fresh tuning data– written in C (LBL has python version)

WAD config file

[bob] src_addr: 0.0.0.0 src_port: 0 dst_addr: 10.5.128.74 dst_port: 0 mode: 1 sndbuf: 2000000 rcvbuf: 100000 wadai: 6 wadmd: 0.3 maxssth: 100 divide: 1 reorder: 9 delack: 0 floyd: 1

Page 11: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

WAD tuning results (your mileage may vary …)

Classic buffer tuning: ORNL to PSC, OC12, 80ms RTT network-challenged app. gets 10 Mbs same app., WAD/NTAF tuned buffer get 143 Mbs

Virtual MSS tune TCP’s additive increase (WAD_AI) add K segments per RTT during recovery k=6 like GigE jumboframe

Page 12: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

WAD tuning

Modified slow-start and AI ORNL to NERSC, OC12, 80 ms RTT often losses in slow start WAD tuned Floyd slowstart (WAD_MaxThresh) and AI (6)

WAD tuned AIMD and slow start ORNL to CERN, OC12, 150ms RTT parallel streams AIMD (1/(2k),k) WAD tune single stream (0.125,4) WAD_MD Can tuned single stream compete with parallel streams?

pre-tune Floyd AIMD or dynamically adjust

tune concurrent flows -- subdivide buffer

Page 13: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

Net100 TCP tuning

Reorder threshold seeing more out of order packets WAD tune a bigger reorder threshold Linux 2.4 does a good job already LBL to ORNL (using our TCP-over-UDP) dup3 case had 289 retransmits, but all were unneeded!

WAD could turn off delayed ACKs -- 2x improvement in recovery rate and slowstart linux 2.4 already turns off delayed ACKs for initial slow-start

WARNING: could be unfair, probably stable use only on intranet

Web100 has proven very useful for experimenting with TCP tuning options.

Page 14: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002

UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory

Futures• Net100

– analyze effectiveness of current tuning options– NTAF probes -- characterizing a path to tune a flow– additional tuning algorithms– parallel/multipath selection/tuning– WAD-to-WAD tuning

• Web100 extensions– Web100 trace files -- log all data efficiently– variable for count of duplicate data segments at receiver– remove wscale restriction

www.net100.org