web100/net100 at oak ridge national lab tom dunigan [email protected] august 1, 2002
TRANSCRIPT
![Page 2: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/2.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Web100 at ORNL
• Funding and goals• Web100 tools and insights
– Java bandwidth server– instrumented probes and log daemon– trace daemons– my favorite Web100 variables
• TCP tuning with Web100– tuning daemon (WAD)– tuning buffer sizes, slow-start, AIMD/VMSS, delayed ACK, reordering, parallel
• Web100 needs
![Page 3: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/3.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Net100: developing network-aware operating systems• DOE-funded (Office of Science) project ($1M/yr, 3 yrs beginning 9/01)• Principal investigators
– Matt Mathis, PSC ([email protected])– Brian Tierney, LBNL ([email protected])– Tom Dunigan, ORNL ([email protected]) Florence Fowler Nagi Rao
• Objective: – measure and understand end-to-end network and application performance– tune network applications (grid and bulk transfer)– first year emphasis: bulk transfer over high delay/bandwidth nets
• Components (leverage Web100)– Network Tool Analysis Framework (NTAF)
• tool design and analysis• active network probes and passive sensors• network metrics data base
– transport protocol analysis– tuning daemon (WAD) to tune network flows based on network metrics
www.net100.org
![Page 4: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/4.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Web100 tools
• Java applet bandwidth/client tester– measure in/out data rates– report flow characteristics– Try it http://firebird.ccs.ornl.gov:7123– INSIGHTS:
• what happened, what you can expect• from server log:
– 25,755 flows– 53% with loss, 23% timeouts
• Post-transfer statistics– ttcp100/iperf100 – Web100 daemon
• avoid modifying applications• log designated paths/ports/variables
– INSIGHTS: later...
![Page 5: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/5.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Web100 tools
• Tracer daemon– collect Web100 variables at 0.1 second intervals– config file specifies
• source/port dest/port • web100 variables (current/delta)
– log to disk with timestamp and CID– C and python (LBL-based) – INSIGHTS:
• watch uninstrumented app’s (GridFTP)• analyze flow dynamics with plots (cwnd, ssthresh, re-
xmits,RTT…)• analyze tuned flows• aggregate parallel flow data
# traced config file#local lport remote rport0.0.0.0 0 124.55.182.7 00.0.0.0 0 134.67.45.9 0#v=value d=deltad PktsOutd PktsRetransv CurrentCwndv SampledRTT
![Page 6: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/6.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
My favorite Web100 variables• Post-transfer
– CurrentMSS/Timeouts: PIX firewall problems– RetransThresh: out of order packets– MaxCwnd/MaxSsthresh: path capacity, linux 2.4 caching– MinRTT/MaxRTT/*RTO: queuing, bandwidth-delay– SendStall/OtherReductions: linux 2.4 slowups– MaxRwinRcvd/Sndbuf: buffer limits, web100 wscale clamp– CongestionSignals/PacketsRetrans: loss intensity– SndLimTime* : bottleneck
• Dynamic– CongestionSignals/PacketsRetrans/CurrentCwnd: type of loss, when (ss)– SampledRTT: queueing delays– CurrentSsthresh/Pktsout: recovery, timeouts– CurrentRwinRcvd: linux 2.4 window advertisement
![Page 7: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/7.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
PIX SACK problem
Web100 reports timeouts into ORNL, not at other sites ??
Theory 1: yet another linux 2.4 TCP feature our TCP-over-UDP: no timeouts
Tcpdump/tcptrace/xplot of flow both inside and outside ORNL ? Tcptrace bug -- SACK blocks wrong for one of the dumps… NOT. ORNL PIX firewall randomizing TCP sequence numbers, but failed to adjust SACK blocks RESULT: TCP timeouts
![Page 8: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/8.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
TCP tuning with Web100+/Net100• Path characterization (NTAF)
– both active and passive measurement– data base of measurement data– NTAF/Web100 hosts at PSC, NCAR,LBL,ORNL
• Application tuning (tuning daemon, WAD)– Web100 extensions
• disable Linux 2.4 caching/SendStall• event notification• more tuning options
– daemon tunes application at start up• static tuning information• query NTAF and calculate optimum TCP parameters
– dynamically tune application (Web100 feedback)• adjust parameters during flow• split optimum among parallel flows
• Transport protocol optimizations– what to tune?– is it fair? stable?
![Page 9: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/9.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Net100 TCP tuning
• TCP performance– reliable/stable/fair– need buffer = bandwidth*RTT
• ORNL/NERSC (80 ms, OC12) need 6 MB– TCP slow-start and loss recovery proportional to MSS/RTT
• slow on today’s high delay/bandwidth paths– TCP is lossy be design
• TCP tuning– set optimal (?) buffer size– avoid losses
• modified slow-start• reduce bursts• anticipate (Vegas?) loss• reorder threshold
– speed recovery• bigger MTU or “virtual MSS”• modified AIMD (0.5,1)• delayed ACKs and initial window
ns simulation: 500 mbs link, 80 ms RTTPacket loss early in slow start.Standard TCP with del ACK takes 10 minutes to recover!
![Page 10: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/10.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Net100 TCP tuning
• Work-around Daemon (WAD) – tune unknowing sender/receiver at startup and/or during
flow– Web100 kernel extensions
• uses netlink to alert daemon of socket open/close• Besides existing Web100 buffer tuning, new code and WAD_*
variables• knobs to disable Linux 2.4 caching and sendstall
– config file with static tuning data• mode specifies dynamic tuning (Floyd AIMD, NTAF buffer size, concurrent
streams)
– daemon periodically polls NTAF for fresh tuning data– written in C (LBL has python version)
WAD config file
[bob] src_addr: 0.0.0.0 src_port: 0 dst_addr: 10.5.128.74 dst_port: 0 mode: 1 sndbuf: 2000000 rcvbuf: 100000 wadai: 6 wadmd: 0.3 maxssth: 100 divide: 1 reorder: 9 delack: 0 floyd: 1
![Page 11: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/11.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
WAD tuning results (your mileage may vary …)
Classic buffer tuning: ORNL to PSC, OC12, 80ms RTT network-challenged app. gets 10 Mbs same app., WAD/NTAF tuned buffer get 143 Mbs
Virtual MSS tune TCP’s additive increase (WAD_AI) add K segments per RTT during recovery k=6 like GigE jumboframe
![Page 12: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/12.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
WAD tuning
Modified slow-start and AI ORNL to NERSC, OC12, 80 ms RTT often losses in slow start WAD tuned Floyd slowstart (WAD_MaxThresh) and AI (6)
WAD tuned AIMD and slow start ORNL to CERN, OC12, 150ms RTT parallel streams AIMD (1/(2k),k) WAD tune single stream (0.125,4) WAD_MD Can tuned single stream compete with parallel streams?
pre-tune Floyd AIMD or dynamically adjust
tune concurrent flows -- subdivide buffer
![Page 13: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/13.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Net100 TCP tuning
Reorder threshold seeing more out of order packets WAD tune a bigger reorder threshold Linux 2.4 does a good job already LBL to ORNL (using our TCP-over-UDP) dup3 case had 289 retransmits, but all were unneeded!
WAD could turn off delayed ACKs -- 2x improvement in recovery rate and slowstart linux 2.4 already turns off delayed ACKs for initial slow-start
WARNING: could be unfair, probably stable use only on intranet
Web100 has proven very useful for experimenting with TCP tuning options.
![Page 14: Web100/Net100 at Oak Ridge National Lab Tom Dunigan thd@ornl.gov August 1, 2002](https://reader037.vdocuments.mx/reader037/viewer/2022110213/5697bfa61a28abf838c97fd5/html5/thumbnails/14.jpg)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Futures• Net100
– analyze effectiveness of current tuning options– NTAF probes -- characterizing a path to tune a flow– additional tuning algorithms– parallel/multipath selection/tuning– WAD-to-WAD tuning
• Web100 extensions– Web100 trace files -- log all data efficiently– variable for count of duplicate data segments at receiver– remove wscale restriction
www.net100.org