prentice hallhigh performance tcp/ip networking, hassan-jain chapter 13 tcp implementation
TRANSCRIPT
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Chapter 13
TCP Implementation
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Objectives
Understand the structure of typical TCP implementation Outline the implementation of extended standards for TCP
over high-performance networks Understand the sources of end-system overhead in typical
TCP implementations, and techniques to minimize them Quantify the effect of end-system overhead and buffering
on TCP performance Understand the role of Remote Direct Memory Access
(RDMA) extensions for high-performance IP networking
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Contents
Overview of TCP implementation High-performance TCP End-system overhead Copy avoidance TCP offload
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Implementation
Overview
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Overall Structure (RFC 793)
Internal structure specified in RFC 793Fig. 13.1
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Data Structure of TCP Endpoint Data structure of TCP endpoint
Transmission control block: Stores the connection state and related variables
Transmit queue: Buffers containing outstanding data Receiver queue: Buffers for received data (but not yet forwarded
to higher layer)
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Buffering and Data Movement
Buffer queues reside in the protocol-independent socket layer within the operating system kernel TCP sender upcalls to the transmit queue to obtain data TCP receiver notifies the receive queue of correct arrival of
incoming data BSD-derived kernels implement buffers in mbufs
Moves data by reference Reduces the need to copy
Most implementations commit buffer space to the queue lazily Queues consume memory only when the bandwidth of the network
does not match the rate at which TCP user produces/consumes data
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
User Memory Access
Provides for movement of data to and from the memory of the TCP user
Copy semanticsSEND and RECEIVE are defined with copy semanticsThe user can modify a send buffer at the time the
SEND is issued
Direct accessAllows TCP to access the user buffers directlyBypasses copying of data
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
TCP Data Exchange TCP endpoints cooperate by exchanging segments Each segment contains:
Sequence number seg.seq, segment data length seg.len, status bits, ack seq number seg.ack, advertised receive window size seg.wnd
Fig. 13.3
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Data Retransmissions
TCP sender uses retransmission timer to derive retransmission of unacknowledged dataRetransmits a segment if the timer fires
Retransmission timeout (RTO)RTO<RTT: Aggressive; too many retransmissionsRTO>RTT: Conservative; low utilisation due to
connection idle
In practice, adaptive retransmission timer with back-off is used (Specified in RFC 2988)
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Congestion Control
A retransmission event indicates (to TCP sender) that the network is congested
Congestion management is a function of the end-systems RFC 2581 requires TCP end-systems respond to
congestion by reducing sending rate AIMD: Additive Increase Multiplicative Decrease
TCP sender probes for available bandwidth on the network path Upon detection of congestion, TCP sender multiplicatively
reduces cwnd Achieves fairness among TCP connections
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
High Performance
TCP
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
TCP Implementation with High Bandwidth-Delay Product
High bandwidth-delay product:High speed networks (e.g. optical networks)High-latency networks (e.g. satellite network)Collectively called Long Fat Networks (LFNs)
LFNs require large window size (more than 16 bits as originally defined for TCP)
Window scale option allows TCP sender to advertise large window size (e.g. 1 Gbyte)Specified at connection setupLimits window sizes in units of up to 16K
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Round Trip Time Estimation
Accuracy of RTT estimation depends on frequent sample measurements of RTTPercentage of segments sampled decreases with larger
windowsMay be insufficient for LFNs
Timestamp optionEnables the sender to compute RTT samplesProvides safeguard against accepting out-of-sequence
numbers
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Path MTU Discovery
Most efficient by using the largest MSS without segmentation
Enables TCP sender to automatically discover the largest acceptable MSS
TCP implementation must correctly handle dynamic changes to MSSNever leaves more than 2*MSS bytes of data
unacknowledgedTCP sender may need to segment data for
retransmission
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
End-System
Overhead
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Reduce End-System Overhead
TCP imposes processing overhead in operating systemAdds directly to latencyConsumes a significant share of CPU cycles
and memory
Reducing overhead can improve application throughput
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Relationship Between Bandwidth and CPU Utilization
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Achievable Throuput for Host-Limited Systems
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Sources of Overhead for TCP/IP Per-transfer overhead Per-packet overhead Per-byte overhead Fig. 13.5
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Per-Packet Overhead
Increasing packet size can mitigate the impact of per-packet and per-segment overhead
Fig. 13.6 Increasing segment size S increases achievable
bandwidthAs packet size grows, the effect of per-packet overhead
becomes less significant
InterruptsA significant source of per-packet overhead
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Relationship between Packet Size and Achievable Bandwidth
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Relationship between Packet Overhead and Bandwidth
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Checksum Overhead
A source of per-byte overhead Ways for reducing checksum overhead:
Complete multiple steps in a single traversal to reduce per-byte overhead
Integrate chechsumming with the data copyCompute the checksum in hardware
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Copy Avoidance
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
Copy Avoidance for High-Performance TCP
Page remapping Uses virtual memory to reduce copying across the TCP/user
interface Typically resides at the socket layer in the OS kernel
Scatter/gather I/O Does not require copy semantics Entails a comprehensive restructuring of OS and I/O interfaces
Remote Direct Memory Access (RDMA) Steers incoming data directly into user-specified buffers IETF standards under way
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
TCP Offload
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain
TCP Offload
Supports TCP/IP protocol functions directly on the network adapter (NIC)ProcessingTCP checksum offloading
Significantly reduces per-packet overheads for TCP/IP protocol processing
Helps to avoid expensive copy operations