transport layer: outline - tu berlin€¦ · transport layer: outline rtransport-layer services...
TRANSCRIPT
1
Transport Layer: Outliner Transport-layer servicesr Multiplexing and
demultiplexingr Connectionless transport:
UDPr Principles of reliable data
transfer
r Connection-oriented transport: TCPm Segment structurem Reliable data transferm Connection managementm Flow control
r Principles of congestion control
r TCP congestion control
2
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
r Full duplex data:m Bi-directional data flow in
same connectionm MSS: maximum segment
size
r Connection-oriented:m Handshaking (exchange of
control msgs) init’s sender, receiver state before data exchange
r Flow controlled:m Sender will not overwhelm
receiver
r Congestion controlled:m Sender will not overwhelm
network
r Point-to-point:m One sender, one receiver
r Reliable, in-order byte stream:m No “message boundaries”
r Pipelined:m TCP congestion and flow
control set window size
r Send & receive buffers
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
segment
applicationwrites data
applicationreads data
3
TCP segment structure
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
ptr urgent datachecksum
FSRPAUheadlen
notused
Options (variable length)
URG: urgent data (generally not used)
ACK: ACK #valid
PSH: push data now(generally not used)
RST, SYN, FIN:connection estab(setup, teardown
commands)
Internetchecksum
(as in UDP)
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
4
Transport layer: Outliner Transport-layer servicesr Multiplexing and
demultiplexingr Connectionless transport:
UDPr Principles of reliable data
transfer
r Connection-oriented transport: TCPm Segment structurem Reliable data transferm Connection managementm Flow control
r Principles of congestion control
r TCP congestion control
5
TCP reliable data transferr TCP creates rdt service on
top of IP’s unreliable service
r Pipelined segmentsr Cumulative acksr TCP uses single
retransmission timer
r Retransmissions are triggered by:m Timeout eventsm Duplicate acks
r Initially consider simplified TCP sender:m Ignore duplicate acksm Ignore flow control,
congestion controlm One way dataflow
6
TCP seq. #’s and ACKsSeq. #’s:
m Byte stream “number” of first byte in segment’s data
ACKs:m Seq # of next byte
expected from other side
m Cumulative ACKQ: How receiver handles
out-of-order segmentsm A: TCP spec doesn’t
say, - up to implementer
Host A Host B
Seq=42, ACK=79, data = ‘C'
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Usertypes
‘C’
host ACKsreceipt
of echoed‘C’
host ACKsreceipt of‘C’, echoes
back ‘C’
timeSimple telnet scenario
7
TCP sender eventsData rcvd from app:r Create segment with
seq #r Seq # is byte-stream
number of first data byte in segment
r Start timer if not already running (think of timer as for oldest unacked segment)
r Expiration interval: TimeOutInterval
Timeout:r Retransmit the one
segment that caused timeout
r Restart timerAck rcvd:r If acknowledges
previously unackedsegmentsm Update what is known to
be ackedm Restart timer if there are
outstanding segments
8
TCP: reliable data transfer
Simplified sender, assuming
waitfor
event
waitfor
event
event: data received from application abovecreate, send segment
event: timer timeout
retransmit segment
event: ACK received,with ACK # y
ACK processing
• One way data transfer• No flow, congestion control
9
TCP sender (simplified)
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) {04 switch(event)05 event: data received from application above06 create TCP segment with sequence number nextseqnum07 if (timer currently not running) start timer08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event: timer timeout11 retransmit not-yet-acknowledged segment with 12 smallest sequence number13 restart timer14 event: ACK received, with ACK field value of y15 if (y > sendbase) { /* cumulative ACK of all data up to y */ 17 sendbase = y 18 if (currently not-yet-acknowledged segments) { 19 restart timer20 } 21 } 22 } /* end of loop forever */
10
TCP retransmission scenariosHost A
timepremature timeout
Host B
Seq=
92 t
imeo
ut Seq=100, 20 bytes data
ACK=120ACK=100
Seq=92, 8 bytes data
Seq=92, 8 bytes data
ACK=120
Host A
lost ACK scenario
Host B
Seq=92, 8 bytes data
ACK=100
lossX
Seq=92, 8 bytes data
ACK=100
timeo
ut
timeSe
q=92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
11
TCP retransmission scenarios (2.)Host A
Seq=92, 8 bytes data
ACK=100
Cumulative ACK scenario
Host B
lossX
Seq=100, 20 bytes data
ACK=120
timeo
ut
time
SendBase= 120
12
TCP round trip time and timeout
Q: How to set TCP timeout value?
r Longer than RTTm Note: RTT will vary
r Too short: premature timeoutm Unnecessary
retransmissionsr Too long: slow reaction
to segment loss
Q: How to estimate RTT?r SampleRTT: measured time from
segment transmission until ACK receiptm Ignore retransmissions,
cumulatively ACKed segmentsr SampleRTT will vary, want
estimated RTT “smoother”m Use several recent
measurements, not just current SampleRTT
13
TCP round trip time and timeout
EstimatedRTT = (1 - α)* EstimatedRTT + α * SampleRTT
r Exponential weighted moving averager Influence of given sample decreases exponentially fastr Typical value of α: 0.125
r Key observation:m At high loads round trip variance is high
14
Example RTT estimationRTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
15
TCP round trip time and timeoutSetting the timeout
r EstimtedRTT plus “safety margin”m Large variation in EstimatedRTT -> larger safety margin
r First estimate of how much SampleRTT deviates from EstimatedRTT:
TimeoutInterval = EstimatedRTT + 4*DevRTT
DevRTT = (1-β)*DevRTT +β*|SampleRTT-EstimatedRTT|
(typically, β = 0.25)
Then set timeout interval:
16
Retransmission ambiguity
A B
ACK
SampleRTT
Original transmission
retransmission
RTO
A BOriginal transmission
retransmissionSampleRTT
ACKRTOX
17
Karn’s RTT estimator
r Accounts for retransmission ambiguitym If a segment has been retransmitted: Don’t count
RTT sample on ACKs for this segment
r If retransmission timer expiresm Double retransmission TimeoutIntervalm Do not use RTT estimate to calculate
TimeoutInterval until successful retransmission
r Timer restarted (not due to timeout)m Reuse RTT estimate
18
Timestamp extension
r Used to improve timeout mechanism by more accurate measurement of RTT
rWhen sending a packet, insert current timestamp into optionm 4 bytes for seconds, 4 bytes for microseconds
r Receiver echoes timestamp in ACKm Actually will echo whatever is in timestamp
r Removes retransmission ambiguitym Can get RTT sample on any packet
19
Timer granularity
rMany TCP implementations set RTO in multiples of 200, 500, 1000ms
rWhy?m Avoid spurious timeouts – RTTs can vary quickly due to
cross trafficmMake timers interrupts efficient
20
Fast retransmitr Time-out period often
relatively long:m Long delay before resending
lost packet
r Detect lost segments via duplicate ACKs.m Sender often sends many
segments back-to-backm If segment is lost, there will
likely be many duplicate ACKs.
r If sender receives 3 ACKsfor the same data, it supposes that segment after ACKed data was lost:m Fast retransmit: resend
segment before timer expires
21
event: ACK received, with ACK field value of y if (y > SendBase) {
SendBase = yif (there are currently not-yet-acknowledged segments)
restart timer }
else { increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) {
resend segment with sequence number y}
Fast retransmit algorithm:
Duplicate ACK for already ACKed segment
Fast retransmit
22
Delayed ACK
r It is inefficient to send too many ACK only packetsrWhy?
m No data => >40 Bytes for 1 byte of information
r Goal: mWait for additional data to piggy bag ACK on data pkt.
r Implementationm Try to not ACK every packet but only ever secondmWait for at most 200msm ACK any out of order data
23
TCP ACK generation [RFC 1122, RFC 2581]
Event at Receiver
In-order segment withexpected seq #. All data up toexpected seq # already ACKed
In-order segment withexpected seq #. One other segment has ACK pending
Out-of-order segmenthigher-than-expect seq. # .Gap detected
Segment that partially or completely fills gap
TCP Receiver action
Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK
Immediately send single cumulative ACK, ACKing both in-order segments
Immediately send duplicate ACK, indicating seq. # of next expected byte
Immediate send ACK, provided thatsegment starts at lower end of gap
24
Transport layer: Outliner Transport-layer servicesr Multiplexing and
demultiplexingr Connectionless transport:
UDPr Principles of reliable data
transfer
r Connection-oriented transport: TCPm Segment structurem Reliable data transferm Connection managementm Flow control
r Principles of congestion control
r TCP congestion control
25
TCP connection management
Recall: TCP sender, receiver establish “connection” before exchanging data segments
r Initialize TCP variables:m seq. #sm buffers, flow control info (e.g. RcvWindow)
r client: connection initiator
Socket clientSocket = new Socket("hostname","port number");
r server: contacted by clientSocket connectionSocket = welcomeSocket.accept();
26
A B
SYN + Seq ASYN+ACK-A + Seq B
ACK-B
Connection establishment
r Use 3-way handshake
27
Sequence number selection
rWhy not simply chose 0?rMust avoid overlap with earlier incarnation
28
TCP connection: Three way handshakeStep 1: Client end system sends TCP SYN control
segment to serverm Specifies initial seq #m Specifies initial window #
Step 2: Server end system receives SYN, replies with SYNACK control segment
m ACKs received SYNm Allocates buffersm Specifies server-> receiver initial seq. #m Specifies initial window #
Step 3: Client system receives SYNACK, replies withACK segment which may contain data
29
TCP connection management (2.)Closing a connection:
client closes socket:
clientSocket.close();
Step 1: Client end system sends TCP FIN control segment to server
Step 2: Server receives FIN, replies with ACK. Closes connection, sends FIN.
ACK
client server
FINclose
FINclose
30
TCP connection management (3.)Step 3: Client receives FIN,
replies with ACK.
m Enters “timed wait” - will respond with ACK to received FINs
Step 4: Server, receives ACK. Connection closed.
Note: With small modification, can handle simultaneous FINs.
client
FIN
server
ACK
FIN
closing
closing
closed
timed
wai
t ACK
closed
31
Tear-down packet exchange
Sender ReceiverFIN
FIN-ACK
FIN
FIN-ACK
Data write
Data ack
32
TCP connection management (cont.)
TCP client lifecycle
33
TCP connection management (cont.)TCP server lifecycle
34
Detecting half-open connections
1. (CRASH)2. CLOSED3. SYN-SENT à <SEQ=400><CTL=SYN>4. (!!) ß <SEQ=300><ACK=100><CTL=ACK>5. SYN-SENT à <SEQ=100><CTL=RST>6. SYN-SENT7. SYN-SENT à <SEQ=400><CTL=SYN>
(send 300, receive 100)ESTABLISHED
à (??)ß ESTABLISHEDà (Abort!!)
CLOSEDà
TCP BTCP A
35
Transport layer: outliner Transport-layer servicesr Multiplexing and
demultiplexingr Connectionless transport:
UDPr Principles of reliable data
transfer
r Connection-oriented transport: TCPm Segment structurem Reliable data transferm Connection managementm Flow control
r Principles of congestion control
r TCP congestion control
36
TCP flow controlr Receive side of TCP
connection has a receive buffer:
r Speed-matching service: match the send rate to the receiving app’s drain rate
r App process may be slow at reading from buffer
sender won’t overflowreceiver’s buffer by
transmitting too much,too fast
flow control
37
TCP flow control: How it works
(Suppose TCP receiver discards out-of-order segments)
r Spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
r Rcvr advertises spare room by including value of RcvWindow in segments
r Sender limits unACKeddata to RcvWindowm Guarantees receive buffer
doesn’t overflow
38
TCP flow control: How it works (2.)
r TCP is a sliding window protocolm For window size n, can send up to n bytes without
receiving an acknowledgement mWhen the data is acknowledged then the window
slides forward
r Each packet advertises a window sizem Indicates number of bytes the receiver has space for
rOriginal TCP always sent entire windowm Congestion control now limits this
39
Window flow control: Sender side
Sent but not acked Not yet sent
window
Next to be sent
Sent and acked
40
Acked but notdelivered to user
Not yetacked
Receive buffer
window
Window flow control: Receiver side
41
TCP persist
rWhat happens if window is 0?m Receiver updates window (i.e., sends ACK with new
window size) when application reads datamWhat if this update is lost?
r TCP persist statem Sender periodically sends 1 byte packetsm Receiver responds with ACK even if it can’t store the
packet
42
Observed TCP problems
r Too many small packetsm Silly window syndromem Nagel’s algorithm
r Initial sequence number selectionr Amount of state maintained
43
Silly window syndrome
r Problem: (Clark, 1982)m If receiver advertises small increases in the receive
window then the sender may waste time sending lots of small packets
r Solutionm Receiver must not advertise small window increases m Increase window by min(MSS,RecvBuffer/2)
44
Nagel’s algorithm
r Small packet problem:m Don’t want to send a 41 byte packet for each
keystrokem How long to wait for more data?
r Solution:m Allow only one outstanding small (not full sized)
segment that has not yet been acknowledged
45
Why is selecting ISN important?
r Suppose machine X selects ISN based on predictable sequence
r Fred has .rhosts to allow login to X from Yr Evil Ed attacks
m Disables host Y – denial of service attackmMake a bunch of connections to host Xm Determine ISN pattern and guess next ISNm Fake pkt1: [<src Y><dst X>, guessed ISN]m Fake pkt2: desired command
46
Time Wait issues
rWeb servers not clients close connection firstm Established à Fin-Waits à Time-Wait à ClosedmWhy would this be a problem?
r Time-Wait state lasts for 2 * MSLmMSL is should be 120 seconds (is often 60s)m Servers often have order of magnitude more
connections in Time-Wait
47
Transport layer: Outliner Transport-layer servicesr Multiplexing and
demultiplexingr Connectionless transport:
UDPr Principles of reliable data
transfer
r Connection-oriented transport: TCPm Segment structurem Reliable data transferm Flow controlm Connection management
r Principles of congestion control
r TCP congestion control