network application performance deke kassabian and shumon huque isc networking &...

65
Network Application Performance Deke Kassabian and Shumon Huque ISC Networking & Telecommunications February 2002 - Super Users Group

Upload: sydney-lambert

Post on 25-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Network Application Performance

Deke Kassabian and Shumon Huque

ISC Networking & Telecommunications

February 2002 - Super Users Group

Introduction

What this talk is all about Network performance on the local area

network and around campus Network performance in the wide area and

for advanced applications

Goal: acceptable performance, positive user experience

Who needs to be involved?

End Users

Researchers

Local Support Providers

Application Developers

System Programmers/Administrators

Network Engineers

What is performance?

“Performance” might mean … Elapsed time for file transfers Packet loss over a period of time Percentage of data needing retransmission Drop outs in video or audio

Subjective “feeling” that feedback is “on time”

Throughput

Throughput is the amount of data that arrives per unit time.

“Goodput” is the amount of data that arrives per unit time, minus the amount of that data that was retransmitted.

DelayDelay is a time measurement for data transfer One way network delay for a bit in transit Delay for a total transfer Time from mouse click to screen message

that the “operation is complete”

NIC to NIC

Stack to StackEyeball to eyeball

Jitter

Variation in delay over time Non-issue for non-realtime applications May be problematic for some applications with real-time

interactive requirements, such as video conferencing

E2E delay of 70 ms +/- 5 ms -> low jitter E2E delay of 35 ms +/- 20 ms -> higher jitter

Some Contributors to Delay

Slow networks

Slow computers

Poor TCP/IP stacks on end-stations

Poorly written applications

Analysis of Delay

A B

(1)Insertion time

(2) Propagation Delay

(3) Processing Delay

Analysis of Delay

A B

(1)Insertion time

(2) Propagation Delay

(3) Processing Delay

Send 1,000 bits from A to B,With an acknowledgement,

Over 100 meters of fiber

0.0001 sec

0.0000004 sec

0.01 sec

From: DekeTo: IraDate: Mon Feb 12, 2002, 11:00AM ESTSubject: Lunch

Hey Ira,

Meet you at the food trucks at noon!

^Deke

Analysis of Delay

A BSend 1,000 bits from A to B,With an acknowledgement,

Over 100 meters of fiber

•Data Insertion • 0.0001 sec

•Propagation • 0.0000004 sec

•Processing (B) • 0.01 sec

•Ack Insertion • 0.001 sec

•Propagation • 0.0000004 sec

•Processing (A) • 0.01 sec

Total Elapsed Time: 0.0211008 secondsTotal Elapsed Time: 0.0211008 seconds

Analysis of Delay

A BAdd 2 switches and arouter to the path

New Total Elapsed Time: 0.0231408 secondsNew Total Elapsed Time: 0.0231408 seconds

S SRAdd 0.00002 secAdd 0.00002 sec Add 0.00002 secAdd 0.00002 sec

Add 0.002 secAdd 0.002 sec

Summary of Delay Analysis

Propagation delay is of little consequence in LANs, more of an issue for high bandwidth WANs.

Queueing delays are rarely major contributors.

Processing delay is almost always an issue.

Retransmission delays can be major contributors to poor network performance.

Speaker Change

What I’m going to talk about

More on delay contributors, their causes and how to minimize themProtocol Stack behavior & tuningQuality of Service (QoS)Performance measurement toolsOperating System tuning examplesGeneral comments about things you can do

Recap: Delay Contributors

Processing Delay

Retransmission Delay

Queueing Delay

Propagation Delay

Processing Delay

Time it takes to process a packet at an end-station or network node. Depends on: Network protocol complexity, application code,

computational power at node, NIC efficiency etc

Endstation Tuning

Application Tuning

Endstation Tuning

Good network hardware/NICsCorrect speed/duplex settings Auto-negotiation problems

Sufficient CPUSufficient MemoryNetwork Protocol Stack tuning Path MTU discovery, Jumbo Frames, TCP

Window Scaling, SACK etc

Ethernet Bandwidth/Duplex mode

Ethernet bandwidth: 10, 100, 1000 10 Gigabit Ethernet soon

Duplex modes: half-duplex, full-duplex

Auto-Negotiation

Mismatch Detection: CRC/Alignment errors Late Collisions

Application Tuning

Optimize access to host resourcesPay attention to Disk I/O issuesPay attention to Bus and Memory issuesKnow what concurrent activity may be interfering with performance of appTuning application send/receive buffersEfficient application protocol designPositive end user feedback Subjective perception of performance

Retransmission Delay

Causes Packet loss

Bad hardware: NICs, switches, routers, transmission lines

Congestion and Queue drops Out of order packet delivery

May be considered packet loss from application’s perspective if it can’t re-order packets

Untimely delivery (delay) Some apps may consider a packet to be lost if they don’t

receive it in a timely fashion

Retransmission Delay (cont)

Mitigating retransmission delay Ensure working equipment

Although some packet loss is unavoidable; eg. most transmission lines have a BER (Bit Error Rate)

Reduce time to recover from packet loss Eg. Highly tuned network stack with more aggressive

retransmission and recovery behavior Forward Error Correction (FEC)

Very useful for time/delay sensitive applications Also, for cases when it’s expensive to retransmit data

Bit Errors on WAN paths

Bit Error Rate (BER) specs for networking interfaces/circuits may not be low enough:

1 bit-error in 10 billion bits Assuming 1500 byte packets Packet error rate: 1 in 1 million 10 hops => 1 in 100,000 packet drop rate

Queueing Delay

Long queueing delays could be caused by lame hardware (switches/routers) Head of line blocking Insufficient switching fabric Insufficient horse power

Unfavorable QoS treatment

Queueing Delay (cont)

How to reduce Use good network hardware Improved network architecture

Reduce number of switching/routing elements on the network path

Richer network topology, more interconnections End user may not have influence over architecture

Employ preferential queue scheduling algorithms Will discuss later in QoS section of talk

Propagation Delay

Restricted by speed of light through transmission medium Can’t be changed, but rarely a concern in

the campus/LAN environment A concern in long distance paths (WAN),

but Some steps can be taken to increase

performance (throughput) on such paths

Other delays and bottlenecks

Intermediary systems DNS Routing issues

Route availability, asymmetric routing, routing protocol stability and convergence time

Firewalls Tunnels (IPSec VPNs, IP in IP tunnels etc)

Router hardware poor at encap/decap

Throughput

Influenced by a number of variables: All the delay factors we discussed Window size (for TCP) Bottleneck link capacity End station processing and buffering

capacity

What I’m going to talk about next

• Brief description of TCP/IP protocol

• How to improve TCP/IP performance

Transport: TCP vs UDP

Network apps use 2 main transport protocols:

TCP (Transmission Control Protocol) Connection oriented (telephone like service) Reliable: guarantees delivery of data Flow control Examples: Web (HTTP), Email (SMTP, IMAP)

UDP (User Datagram Protocol) Connectionless (postal system like) Unreliable: no guarantees of delivery Examples: DNS, various types of streaming media

When to use TCP or UDP?

Many common apps use TCP because it’s convenient

TCP handles reliable delivery, retransmissions of lost packets, re-ordering, flow control etc

You may want to use UDP if: Delays introduced by ACKs are unacceptable TCP congestion avoidance and flow control measures

are unsuitable for your application You want more control of how your data is transported

over the network Highly delay/jitter sensitive apps often use UDP

Audio-video conferencing etc

Network Stack Tuning

Jumbo Frames

Path MTU Discovery

TCP Extensions: Window Scaling - RFC 1323 Fast Retransmit Fast Recovery Selective Acknowledgements

Jumbo Frames

Increase MTU used at link layer, allowing larger maximum sized framesIncreases Network ThroughputFewer larger frames means:

Fewer CPU interrupts and less processing overhead for a given data transfer size

Some studies have shown Gigabit Ethernet using 9000 byte jumbo frames provided 50% more throughput and used 50% less CPU! (default Ethernet MTU is 1500 bytes)

Jumbo Frames (cont)

Pitfalls: Not widely deployed yet

Many network devices may not be capable of jumbo frames (they’ll look like bad frames)

May cause excessive IP fragmentation BER may have more impact on jumbo frames

Eg. A single bit-error can cause a large amount of data to be lost and retransmitted

May have negative impact on host processing requirements: More memory for buffering, newer NICs

Path MTU Discovery

MTU (Max Transmission Unit) Max sized frame allowed on the link

Path MTU Min MTU on any network in the path between 2 hosts

IP Fragmentation & ReassemblyPath MTU DiscoveryMSS (Max Segment Size)What happens without PMTU discovery? Might select wrong MTU and cause fragmentation Suboptimal selection of TCP MSS (536 default?)

Path MTU Discovery (cont)

A

B

R1

R3

R2

MTU=4474

MTU=9000

MTU=1500

MTU=9000

Path MTU is 1500

IP fragmentation may occur

TCP Sliding Window

TCP uses a flow control method called “Sliding Window” Allows sender to send multiple segments before it

has to wait for an ACK Results in faster transfer rate coz sender doesn’t

have to wait for an ACK each time a packet is sent Receiver advertises a window size that tells the

sender how much data it can send without waiting for ACK

TCP Sliding Window (cont)

Slow Start

In actuality, TCP starts with small window and slowly ramps it up (upto rwin)

Congestion Window (cwnd) controls startup and limits throughput in the face of

congestion cwnd initialized to 1 segment cwnd gets larger after every new ACK cwnd gets smaller when packet loss is detected

Slow Start is actually exponential

Congestion Avoidance

Assumption: packet loss is caused by congestion

When congestion occurs, slow down transmission rate Reset cwnd to 1 if timeout Use slowstart until we reach the half way point

where congestion occurred. Then use linear increase

Increase cwnd by ~ 1 segment/RTT

TCP Behavior

Recovery after a loss can be very slow on today’s high delay/bandwidth links

(graph from Peter O’Neill, NCAR)

CWND

slow start: exponential

increasecongestion avoidance:

linear increase

packet loss, D-ACK

time

retransmit: slow start

again

timeout

TCP Throughput Acceleration

rtt (msec) 0-100Mbps (sec)

5 0.216

10 0.864

20 3.45

50 21.6

100 86.4

200 345

(From Phil Dykstra)

TCP Window Size Tuning

TCP performance depends on: Transfer rate (bandwidth) Round trip time

BW*Delay product

TCP Window should be sized to be at least as large as the BW*Delay product

BW*Delay Product

BW*Delay product measures: Amount of data that would fill the network

pipe Buffer space required at sender and

receiver to achieve the max possible TCP throughput

Amount of unacknowledged data that TCP must handle in order to keep pipe full

BW*Delay example

A path from Penn to Stanford has: Round trip time: 60 ms Bandwidth: 120 Mbps

BW * Delay = 60/1000 sec * 120 * 1000000 bits/sec = 7200000 bits = 7200 Kbits = 900 Kbytes

So TCP window should be at least 900KB

TCP Window Scaling

RFC 1323: TCP Extensions for High PerformanceAllows scaling of TCP window size beyond 64KB (16 bit window field) Introduces new TCP option

Note: In previous example, TCP needs to support Window Scaling to use 900KB window

Window Scaling Pitfalls

Why not use large windows always? Might consume large memory resources May not be useful for all applications Isn’t useful in the campus/LAN

environment

Fast Retransmit Fast Recovery

TCP required to send immediate D-ACK when out-of-order packet received

After 3 D-ACKs, sending TCP retransmits only one segment

Also perform congestion avoidance but not slow start

1 2 73 4 5 6

Packet loss, causing D-ACK

TCP Selective Acks (SACK)

RFC 2018

Allows TCP to efficiently recover from multiple segment losses within a window

Without retransmitting entire window

Enough about TCP

Performance depends on App

So, understand application’s requirements (high throughput, low latency, low jitter), eg:

File Transfer using TCP Needs high throughput Intolerant of packet loss May be more tolerant of delay

Interactive Video Conferencing application Tolerant of some loss More intolerant of delay and jitter

Quality of Service (QoS)

A method to selectively allocate scarce network resources

A mechanism to offer varying degrees of service to varying classes of traffic

Service: delay, jitter, proportion of link bandwidth etc

Quality of Service (QoS) cont

Requires deployed QoS infrastructure Might require

Traffic marking capabilities in hosts and network hardware Traffic classification and identification capabilities Multiple traffic queues with different service characteristics Different queue servicing algorithms Mechanisms to specify and enforce QoS policy Signalling mechanisms

IEEE 802.1p, IP precedence, IntServ/RSVP, DiffServ, MPLS

Performance Measurement Tools

To measure “real” performance of an app, you need to instrument the app with measurement code!However, independent measurement of some common network perf metrics can be doneTwo kinds: Active and Passive measurement

Active Measurement

Ping

TracerouteNetperf http://www.netperf.org/

Iperf http://dast.nlanr.net/Projects/Iperf/

Pathchar ftp://ftp.ee.lbl.gov/pathchar/

Pathrate http://www.pathrate.org/

Mping

Passive Measurement

OCxMON/PCMon

Router/switch stats collected via SNMP Netflow, etc

tcpdump, snoop, etherfind

Some tuning examples

Microsoft Windows Newer versions: Win98, Win2K, WinXP

support many of the features (window scaling, PMTU discovery, SACK etc)

May require registry tweaks to turn some of them on

TCPTune: A TCP Stack Tuner for Windows http://moat.nlanr.net/Software/TCPtune/

More tuning examples

MacOS X [need to find out more, who knows?] Supports window scaling:

$ sysctl net.inet.tcp.rfc1323 net.inet.tcp.rfc1323: 1

Socket buffer raising: Kernel tunable kern.ipc.maxsockbuf

TCP send/receive buffer tuning: Tunables supported:

net.inet.tcp.sendspace net.inet.tcp.recvspace

More tuning examples

Linux In /proc/sys/net/core/ set:

rmem_default rmem_max wmem_default wmem_max

In /proc/sys/net/ipv4 set: tcp_windowscaling tcp_sack

More tuning examples

Solaris 2.x - 8 ndd -set /dev/tcp tcp_max_buf xxx ndd -set /dev/tcp tcp_xmit_hiwat xxx ndd -set /dev/tcp tcp_recv_hiwat xxx

ndd -set /dev/ip ip_path_mtu_discovery 1

ndd -set /dev/tcp tcp_sack_permitted 2

Web100 Project

http://www.web100.org/Enhance TCP capabilities with: Better (finer grain) kernel instrumentation Automatic controls

Availability: Today: Linux (patches for 2.4.16 kernel) Being ported to other operating systems.

Things you can do (WAN)

Make sure app offers adequately sized receive windows and send buffersBut don’t run your system out of memoryFind out your path RTT with pingCheck your path with tracerouteDetermine bottleneck capacity and available bandwidth on pathMake sure your OS uses Path MTU discoveryMake sure your OS uses TCP Large Windows, Fast Retransmit, SACK

Things you can do (Campus)

Check your host (80% of the problems)

Check your host Bandwidth/Duplex problems Network stack tuning Application tuning

Talk to campus networking folks

Conclusion

Understand performance requirements of your application

What are the issues? Campus/LAN environment WAN environment

What can you do to ask for help?

Any Questions?

Deke Kassabian [email protected]

Shumon Huque [email protected]