ch 3chapter 3 transport layer
TRANSCRIPT
Ch 3Chapter 3Transport LayerTransport Layer
Computer Networking A T D A h
A note on the use of these ppt slidesWersquore making these slides freely available to all (faculty students readers) Theyrsquore in PowerPoint form so you can add modify and delete slides (including this one) and slide content to suit your needs They obviously
A Top Down Approach 5th edition Jim Kurose Keith RossAddis W sl A il
( g ) y y yrepresent a lot of work on our part In return for use we only ask the following
If you use these slides (eg in a class) in substantially unaltered form that you mention their source (after all wersquod like people to use our book)
If you post any slides in substantially unaltered form on a www site that Addison-Wesley April 2009
If you post any slides in substantially unaltered form on a www site that you note that they are adapted from (or perhaps identical to) our slides and note our copyright of this material
Thanks and enjoy JFKKWR
Transport Layer 3-1
All material copyright 1996-2009JF Kurose and KW Ross All Rights Reserved
Chapter 3 Transport LayerChapter 3 Transport LayerOur goalsOur goa s
understand principles behind transport
learn about transport layer protocols in the p
layer servicesmultiplexingdemultiplxin
y pInternet
UDP connectionless transportexing
reliable data transferflow control
transportTCP connection-oriented transportflow control
congestion control TCP congestion control
Transport Layer 3-2
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-3
Transport services and protocolsTransport services and protocolsprovide logical communication
applicationtransportnetworkd li k
p gbetween app processes running on different hoststransport protocols run in
data linkphysical
transport protocols run in end systems
send side breaks app messages into segments passes to network layerrcv side reassembles applicationrcv side reassembles segments into messages passes to app layer
th t s t
pptransportnetworkdata linkphysical
more than one transport protocol available to apps
Internet TCP and UDP
Transport Layer 3-4
Transport vs network layerTransport vs network layer
k l l i l Household analogynetwork layer logical communication between hosts
Household analogy12 kids sending letters to
12 kidsbetween hoststransport layer logical communication
12 kidsprocesses = kidsapp messages = letters communication
between processes relies on enhances
app messages = letters in envelopeshosts = housesnetwork layer services hosts = housestransport protocol = Ann and BillAnn and Billnetwork-layer protocol = postal service
Transport Layer 3-5
p
Internet transport-layer protocolsp y p
reliable in-order applicationtransportreliable in order
delivery (TCP)congestion control
transportnetworkdata linkphysical
networkdata linkphysical network
flow controlconnection setup
li bl d d
physical
n t k
data linkphysical
unreliable unordered delivery UDP
no frills extension of network
networkdata linkphysical
networkdata linkphysical
no-frills extension of ldquobest-effortrdquo IP
services not available networkdata linkphysical
networkdata linkphysical application
transportnetworkdata linkphysical
delay guaranteesbandwidth guarantees
physical
Transport Layer 3-6
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-7
MultiplexingdemultiplexingMultiplexingdemultiplexingDemultiplexing at rcv host
th i d t f lti lMultiplexing at send host
delivering received segmentsto correct socket
gathering data from multiplesockets enveloping data with header (later used for
= process= socketdemultiplexing)
application
transport
P1 application
transport
application
transport
P2P3 P4P1
transport
network
link
network
link
transport
network
linklink
physical
link
physical
link
physical
host 3
Transport Layer 3-8
host 1 host 2 host 3
How demultiplexing worksHow demultiplexing workshost receives IP datagrams
h d t h each datagram has source IP address destination IP address source port dest port
32 bits
each datagram carries 1 transport-layer segmenteach segment has source
other header fieldseach segment has source destination port number
host uses IP addresses amp port applicationnumbers to direct segment to appropriate socket
ppdata
(message)
TCPUDP segment format
Transport Layer 3-9
Connectionless demultiplexingConnectionless demultiplexing
C k i h When host receives UDP Create sockets with port numbers
DatagramSocket mySocket1 = new
When host receives UDP segment
checks destination port DatagramSocket mySocket1 = new DatagramSocket(12534)
DatagramSocket mySocket2 = new DatagramSocket(12535)
pnumber in segmentdirects UDP segment to socket with that port DatagramSocket(12535)
UDP socket identified by two-tuple
socket with that port number
IP datagrams with two tuple(dest IP address dest port number)
gdifferent source IP addresses andor source
t b di t d port numbers directed to same socket
Transport Layer 3-10
Connectionless demux (cont)Connectionless demux (cont)k k k (6428)DatagramSocket serverSocket = new DatagramSocket(6428)
P2 P1P1P3P2 P1P3
SP 6428DP 9157
SP 6428DP 5775
ClientIP B
clientIP A
serverSP 9157DP 6428
SP 5775DP 6428
IPBIP A IP C
SP provides ldquoreturn addressrdquo
Transport Layer 3-11
SP provides return address
Connection-oriented demuxConnection-oriented demux
TCP k id ifi d h TCP socket identified by 4-tuple
source IP address
Server host may support many simultaneous TCP socketssource IP address
source port numberdest IP address
socketseach socket identified by its own 4-tuple
dest port numberrecv host uses all four
Web servers have different sockets for
h i livalues to direct segment to appropriate s ck t
each connecting clientnon-persistent HTTP will have different socket for socket have different socket for each request
Transport Layer 3-12
Connection-oriented demux (cont)
P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-13
Connection-oriented demux Threaded Web Server
P1 P1P2P4 P3P1 P2P4 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-14
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-15
UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]
ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP
Why is there a UDPno connection
ldquobest effortrdquo service UDP segments may be
lost
establishment (which can add delay)simple no connection state
delivered out of order to app
c nn cti nl ss
simple no connection state at sender receiversmall segment header
connectionlessno handshaking between UDP sender receiver
no congestion control UDP can blast away as fast as desired
each UDP segment handled independently of others
Transport Layer 3-16
of others
UDP moreUDP moreoften used for streaming often used for streaming multimedia apps
loss tolerant source port dest port
32 bits
Length in f rate sensitive
other UDP usesDNS
length checksumbytes of UDPsegmentincluding
h dDNSSNMP
reliable transfer over UDP Application
header
reliable transfer over UDP add reliability at application layer
li ti s ifi
ppdata
(message)application-specific error recovery
UDP segment format
Transport Layer 3-17
UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted
segment
Sendertreat segment contents
f 16 bit
Receivercompute checksum of
i d tas sequence of 16-bit integerschecksum addition (1rsquos
received segmentcheck if computed checksum equals checksum field value(
complement sum) of segment contentssender puts checksum
qNO - error detectedYES - no error detected B t m b s sender puts checksum
value into UDP checksum field
But maybe errors nonetheless More later hellip
Transport Layer 3-18
Internet Checksum ExamplepNote
When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
wraparound
sumh k
Transport Layer 3-19
1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-20
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedReliable data transfer getting started
rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to
deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over
li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-3
Transport services and protocolsTransport services and protocolsprovide logical communication
applicationtransportnetworkd li k
p gbetween app processes running on different hoststransport protocols run in
data linkphysical
transport protocols run in end systems
send side breaks app messages into segments passes to network layerrcv side reassembles applicationrcv side reassembles segments into messages passes to app layer
th t s t
pptransportnetworkdata linkphysical
more than one transport protocol available to apps
Internet TCP and UDP
Transport Layer 3-4
Transport vs network layerTransport vs network layer
k l l i l Household analogynetwork layer logical communication between hosts
Household analogy12 kids sending letters to
12 kidsbetween hoststransport layer logical communication
12 kidsprocesses = kidsapp messages = letters communication
between processes relies on enhances
app messages = letters in envelopeshosts = housesnetwork layer services hosts = housestransport protocol = Ann and BillAnn and Billnetwork-layer protocol = postal service
Transport Layer 3-5
p
Internet transport-layer protocolsp y p
reliable in-order applicationtransportreliable in order
delivery (TCP)congestion control
transportnetworkdata linkphysical
networkdata linkphysical network
flow controlconnection setup
li bl d d
physical
n t k
data linkphysical
unreliable unordered delivery UDP
no frills extension of network
networkdata linkphysical
networkdata linkphysical
no-frills extension of ldquobest-effortrdquo IP
services not available networkdata linkphysical
networkdata linkphysical application
transportnetworkdata linkphysical
delay guaranteesbandwidth guarantees
physical
Transport Layer 3-6
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-7
MultiplexingdemultiplexingMultiplexingdemultiplexingDemultiplexing at rcv host
th i d t f lti lMultiplexing at send host
delivering received segmentsto correct socket
gathering data from multiplesockets enveloping data with header (later used for
= process= socketdemultiplexing)
application
transport
P1 application
transport
application
transport
P2P3 P4P1
transport
network
link
network
link
transport
network
linklink
physical
link
physical
link
physical
host 3
Transport Layer 3-8
host 1 host 2 host 3
How demultiplexing worksHow demultiplexing workshost receives IP datagrams
h d t h each datagram has source IP address destination IP address source port dest port
32 bits
each datagram carries 1 transport-layer segmenteach segment has source
other header fieldseach segment has source destination port number
host uses IP addresses amp port applicationnumbers to direct segment to appropriate socket
ppdata
(message)
TCPUDP segment format
Transport Layer 3-9
Connectionless demultiplexingConnectionless demultiplexing
C k i h When host receives UDP Create sockets with port numbers
DatagramSocket mySocket1 = new
When host receives UDP segment
checks destination port DatagramSocket mySocket1 = new DatagramSocket(12534)
DatagramSocket mySocket2 = new DatagramSocket(12535)
pnumber in segmentdirects UDP segment to socket with that port DatagramSocket(12535)
UDP socket identified by two-tuple
socket with that port number
IP datagrams with two tuple(dest IP address dest port number)
gdifferent source IP addresses andor source
t b di t d port numbers directed to same socket
Transport Layer 3-10
Connectionless demux (cont)Connectionless demux (cont)k k k (6428)DatagramSocket serverSocket = new DatagramSocket(6428)
P2 P1P1P3P2 P1P3
SP 6428DP 9157
SP 6428DP 5775
ClientIP B
clientIP A
serverSP 9157DP 6428
SP 5775DP 6428
IPBIP A IP C
SP provides ldquoreturn addressrdquo
Transport Layer 3-11
SP provides return address
Connection-oriented demuxConnection-oriented demux
TCP k id ifi d h TCP socket identified by 4-tuple
source IP address
Server host may support many simultaneous TCP socketssource IP address
source port numberdest IP address
socketseach socket identified by its own 4-tuple
dest port numberrecv host uses all four
Web servers have different sockets for
h i livalues to direct segment to appropriate s ck t
each connecting clientnon-persistent HTTP will have different socket for socket have different socket for each request
Transport Layer 3-12
Connection-oriented demux (cont)
P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-13
Connection-oriented demux Threaded Web Server
P1 P1P2P4 P3P1 P2P4 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-14
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-15
UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]
ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP
Why is there a UDPno connection
ldquobest effortrdquo service UDP segments may be
lost
establishment (which can add delay)simple no connection state
delivered out of order to app
c nn cti nl ss
simple no connection state at sender receiversmall segment header
connectionlessno handshaking between UDP sender receiver
no congestion control UDP can blast away as fast as desired
each UDP segment handled independently of others
Transport Layer 3-16
of others
UDP moreUDP moreoften used for streaming often used for streaming multimedia apps
loss tolerant source port dest port
32 bits
Length in f rate sensitive
other UDP usesDNS
length checksumbytes of UDPsegmentincluding
h dDNSSNMP
reliable transfer over UDP Application
header
reliable transfer over UDP add reliability at application layer
li ti s ifi
ppdata
(message)application-specific error recovery
UDP segment format
Transport Layer 3-17
UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted
segment
Sendertreat segment contents
f 16 bit
Receivercompute checksum of
i d tas sequence of 16-bit integerschecksum addition (1rsquos
received segmentcheck if computed checksum equals checksum field value(
complement sum) of segment contentssender puts checksum
qNO - error detectedYES - no error detected B t m b s sender puts checksum
value into UDP checksum field
But maybe errors nonetheless More later hellip
Transport Layer 3-18
Internet Checksum ExamplepNote
When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
wraparound
sumh k
Transport Layer 3-19
1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-20
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedReliable data transfer getting started
rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to
deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over
li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Transport vs network layerTransport vs network layer
k l l i l Household analogynetwork layer logical communication between hosts
Household analogy12 kids sending letters to
12 kidsbetween hoststransport layer logical communication
12 kidsprocesses = kidsapp messages = letters communication
between processes relies on enhances
app messages = letters in envelopeshosts = housesnetwork layer services hosts = housestransport protocol = Ann and BillAnn and Billnetwork-layer protocol = postal service
Transport Layer 3-5
p
Internet transport-layer protocolsp y p
reliable in-order applicationtransportreliable in order
delivery (TCP)congestion control
transportnetworkdata linkphysical
networkdata linkphysical network
flow controlconnection setup
li bl d d
physical
n t k
data linkphysical
unreliable unordered delivery UDP
no frills extension of network
networkdata linkphysical
networkdata linkphysical
no-frills extension of ldquobest-effortrdquo IP
services not available networkdata linkphysical
networkdata linkphysical application
transportnetworkdata linkphysical
delay guaranteesbandwidth guarantees
physical
Transport Layer 3-6
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-7
MultiplexingdemultiplexingMultiplexingdemultiplexingDemultiplexing at rcv host
th i d t f lti lMultiplexing at send host
delivering received segmentsto correct socket
gathering data from multiplesockets enveloping data with header (later used for
= process= socketdemultiplexing)
application
transport
P1 application
transport
application
transport
P2P3 P4P1
transport
network
link
network
link
transport
network
linklink
physical
link
physical
link
physical
host 3
Transport Layer 3-8
host 1 host 2 host 3
How demultiplexing worksHow demultiplexing workshost receives IP datagrams
h d t h each datagram has source IP address destination IP address source port dest port
32 bits
each datagram carries 1 transport-layer segmenteach segment has source
other header fieldseach segment has source destination port number
host uses IP addresses amp port applicationnumbers to direct segment to appropriate socket
ppdata
(message)
TCPUDP segment format
Transport Layer 3-9
Connectionless demultiplexingConnectionless demultiplexing
C k i h When host receives UDP Create sockets with port numbers
DatagramSocket mySocket1 = new
When host receives UDP segment
checks destination port DatagramSocket mySocket1 = new DatagramSocket(12534)
DatagramSocket mySocket2 = new DatagramSocket(12535)
pnumber in segmentdirects UDP segment to socket with that port DatagramSocket(12535)
UDP socket identified by two-tuple
socket with that port number
IP datagrams with two tuple(dest IP address dest port number)
gdifferent source IP addresses andor source
t b di t d port numbers directed to same socket
Transport Layer 3-10
Connectionless demux (cont)Connectionless demux (cont)k k k (6428)DatagramSocket serverSocket = new DatagramSocket(6428)
P2 P1P1P3P2 P1P3
SP 6428DP 9157
SP 6428DP 5775
ClientIP B
clientIP A
serverSP 9157DP 6428
SP 5775DP 6428
IPBIP A IP C
SP provides ldquoreturn addressrdquo
Transport Layer 3-11
SP provides return address
Connection-oriented demuxConnection-oriented demux
TCP k id ifi d h TCP socket identified by 4-tuple
source IP address
Server host may support many simultaneous TCP socketssource IP address
source port numberdest IP address
socketseach socket identified by its own 4-tuple
dest port numberrecv host uses all four
Web servers have different sockets for
h i livalues to direct segment to appropriate s ck t
each connecting clientnon-persistent HTTP will have different socket for socket have different socket for each request
Transport Layer 3-12
Connection-oriented demux (cont)
P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-13
Connection-oriented demux Threaded Web Server
P1 P1P2P4 P3P1 P2P4 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-14
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-15
UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]
ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP
Why is there a UDPno connection
ldquobest effortrdquo service UDP segments may be
lost
establishment (which can add delay)simple no connection state
delivered out of order to app
c nn cti nl ss
simple no connection state at sender receiversmall segment header
connectionlessno handshaking between UDP sender receiver
no congestion control UDP can blast away as fast as desired
each UDP segment handled independently of others
Transport Layer 3-16
of others
UDP moreUDP moreoften used for streaming often used for streaming multimedia apps
loss tolerant source port dest port
32 bits
Length in f rate sensitive
other UDP usesDNS
length checksumbytes of UDPsegmentincluding
h dDNSSNMP
reliable transfer over UDP Application
header
reliable transfer over UDP add reliability at application layer
li ti s ifi
ppdata
(message)application-specific error recovery
UDP segment format
Transport Layer 3-17
UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted
segment
Sendertreat segment contents
f 16 bit
Receivercompute checksum of
i d tas sequence of 16-bit integerschecksum addition (1rsquos
received segmentcheck if computed checksum equals checksum field value(
complement sum) of segment contentssender puts checksum
qNO - error detectedYES - no error detected B t m b s sender puts checksum
value into UDP checksum field
But maybe errors nonetheless More later hellip
Transport Layer 3-18
Internet Checksum ExamplepNote
When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
wraparound
sumh k
Transport Layer 3-19
1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-20
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedReliable data transfer getting started
rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to
deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over
li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-7
MultiplexingdemultiplexingMultiplexingdemultiplexingDemultiplexing at rcv host
th i d t f lti lMultiplexing at send host
delivering received segmentsto correct socket
gathering data from multiplesockets enveloping data with header (later used for
= process= socketdemultiplexing)
application
transport
P1 application
transport
application
transport
P2P3 P4P1
transport
network
link
network
link
transport
network
linklink
physical
link
physical
link
physical
host 3
Transport Layer 3-8
host 1 host 2 host 3
How demultiplexing worksHow demultiplexing workshost receives IP datagrams
h d t h each datagram has source IP address destination IP address source port dest port
32 bits
each datagram carries 1 transport-layer segmenteach segment has source
other header fieldseach segment has source destination port number
host uses IP addresses amp port applicationnumbers to direct segment to appropriate socket
ppdata
(message)
TCPUDP segment format
Transport Layer 3-9
Connectionless demultiplexingConnectionless demultiplexing
C k i h When host receives UDP Create sockets with port numbers
DatagramSocket mySocket1 = new
When host receives UDP segment
checks destination port DatagramSocket mySocket1 = new DatagramSocket(12534)
DatagramSocket mySocket2 = new DatagramSocket(12535)
pnumber in segmentdirects UDP segment to socket with that port DatagramSocket(12535)
UDP socket identified by two-tuple
socket with that port number
IP datagrams with two tuple(dest IP address dest port number)
gdifferent source IP addresses andor source
t b di t d port numbers directed to same socket
Transport Layer 3-10
Connectionless demux (cont)Connectionless demux (cont)k k k (6428)DatagramSocket serverSocket = new DatagramSocket(6428)
P2 P1P1P3P2 P1P3
SP 6428DP 9157
SP 6428DP 5775
ClientIP B
clientIP A
serverSP 9157DP 6428
SP 5775DP 6428
IPBIP A IP C
SP provides ldquoreturn addressrdquo
Transport Layer 3-11
SP provides return address
Connection-oriented demuxConnection-oriented demux
TCP k id ifi d h TCP socket identified by 4-tuple
source IP address
Server host may support many simultaneous TCP socketssource IP address
source port numberdest IP address
socketseach socket identified by its own 4-tuple
dest port numberrecv host uses all four
Web servers have different sockets for
h i livalues to direct segment to appropriate s ck t
each connecting clientnon-persistent HTTP will have different socket for socket have different socket for each request
Transport Layer 3-12
Connection-oriented demux (cont)
P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-13
Connection-oriented demux Threaded Web Server
P1 P1P2P4 P3P1 P2P4 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-14
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-15
UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]
ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP
Why is there a UDPno connection
ldquobest effortrdquo service UDP segments may be
lost
establishment (which can add delay)simple no connection state
delivered out of order to app
c nn cti nl ss
simple no connection state at sender receiversmall segment header
connectionlessno handshaking between UDP sender receiver
no congestion control UDP can blast away as fast as desired
each UDP segment handled independently of others
Transport Layer 3-16
of others
UDP moreUDP moreoften used for streaming often used for streaming multimedia apps
loss tolerant source port dest port
32 bits
Length in f rate sensitive
other UDP usesDNS
length checksumbytes of UDPsegmentincluding
h dDNSSNMP
reliable transfer over UDP Application
header
reliable transfer over UDP add reliability at application layer
li ti s ifi
ppdata
(message)application-specific error recovery
UDP segment format
Transport Layer 3-17
UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted
segment
Sendertreat segment contents
f 16 bit
Receivercompute checksum of
i d tas sequence of 16-bit integerschecksum addition (1rsquos
received segmentcheck if computed checksum equals checksum field value(
complement sum) of segment contentssender puts checksum
qNO - error detectedYES - no error detected B t m b s sender puts checksum
value into UDP checksum field
But maybe errors nonetheless More later hellip
Transport Layer 3-18
Internet Checksum ExamplepNote
When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
wraparound
sumh k
Transport Layer 3-19
1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-20
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedReliable data transfer getting started
rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to
deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over
li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
How demultiplexing worksHow demultiplexing workshost receives IP datagrams
h d t h each datagram has source IP address destination IP address source port dest port
32 bits
each datagram carries 1 transport-layer segmenteach segment has source
other header fieldseach segment has source destination port number
host uses IP addresses amp port applicationnumbers to direct segment to appropriate socket
ppdata
(message)
TCPUDP segment format
Transport Layer 3-9
Connectionless demultiplexingConnectionless demultiplexing
C k i h When host receives UDP Create sockets with port numbers
DatagramSocket mySocket1 = new
When host receives UDP segment
checks destination port DatagramSocket mySocket1 = new DatagramSocket(12534)
DatagramSocket mySocket2 = new DatagramSocket(12535)
pnumber in segmentdirects UDP segment to socket with that port DatagramSocket(12535)
UDP socket identified by two-tuple
socket with that port number
IP datagrams with two tuple(dest IP address dest port number)
gdifferent source IP addresses andor source
t b di t d port numbers directed to same socket
Transport Layer 3-10
Connectionless demux (cont)Connectionless demux (cont)k k k (6428)DatagramSocket serverSocket = new DatagramSocket(6428)
P2 P1P1P3P2 P1P3
SP 6428DP 9157
SP 6428DP 5775
ClientIP B
clientIP A
serverSP 9157DP 6428
SP 5775DP 6428
IPBIP A IP C
SP provides ldquoreturn addressrdquo
Transport Layer 3-11
SP provides return address
Connection-oriented demuxConnection-oriented demux
TCP k id ifi d h TCP socket identified by 4-tuple
source IP address
Server host may support many simultaneous TCP socketssource IP address
source port numberdest IP address
socketseach socket identified by its own 4-tuple
dest port numberrecv host uses all four
Web servers have different sockets for
h i livalues to direct segment to appropriate s ck t
each connecting clientnon-persistent HTTP will have different socket for socket have different socket for each request
Transport Layer 3-12
Connection-oriented demux (cont)
P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-13
Connection-oriented demux Threaded Web Server
P1 P1P2P4 P3P1 P2P4 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-14
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-15
UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]
ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP
Why is there a UDPno connection
ldquobest effortrdquo service UDP segments may be
lost
establishment (which can add delay)simple no connection state
delivered out of order to app
c nn cti nl ss
simple no connection state at sender receiversmall segment header
connectionlessno handshaking between UDP sender receiver
no congestion control UDP can blast away as fast as desired
each UDP segment handled independently of others
Transport Layer 3-16
of others
UDP moreUDP moreoften used for streaming often used for streaming multimedia apps
loss tolerant source port dest port
32 bits
Length in f rate sensitive
other UDP usesDNS
length checksumbytes of UDPsegmentincluding
h dDNSSNMP
reliable transfer over UDP Application
header
reliable transfer over UDP add reliability at application layer
li ti s ifi
ppdata
(message)application-specific error recovery
UDP segment format
Transport Layer 3-17
UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted
segment
Sendertreat segment contents
f 16 bit
Receivercompute checksum of
i d tas sequence of 16-bit integerschecksum addition (1rsquos
received segmentcheck if computed checksum equals checksum field value(
complement sum) of segment contentssender puts checksum
qNO - error detectedYES - no error detected B t m b s sender puts checksum
value into UDP checksum field
But maybe errors nonetheless More later hellip
Transport Layer 3-18
Internet Checksum ExamplepNote
When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
wraparound
sumh k
Transport Layer 3-19
1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-20
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedReliable data transfer getting started
rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to
deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over
li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Connectionless demux (cont)Connectionless demux (cont)k k k (6428)DatagramSocket serverSocket = new DatagramSocket(6428)
P2 P1P1P3P2 P1P3
SP 6428DP 9157
SP 6428DP 5775
ClientIP B
clientIP A
serverSP 9157DP 6428
SP 5775DP 6428
IPBIP A IP C
SP provides ldquoreturn addressrdquo
Transport Layer 3-11
SP provides return address
Connection-oriented demuxConnection-oriented demux
TCP k id ifi d h TCP socket identified by 4-tuple
source IP address
Server host may support many simultaneous TCP socketssource IP address
source port numberdest IP address
socketseach socket identified by its own 4-tuple
dest port numberrecv host uses all four
Web servers have different sockets for
h i livalues to direct segment to appropriate s ck t
each connecting clientnon-persistent HTTP will have different socket for socket have different socket for each request
Transport Layer 3-12
Connection-oriented demux (cont)
P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-13
Connection-oriented demux Threaded Web Server
P1 P1P2P4 P3P1 P2P4 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-14
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-15
UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]
ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP
Why is there a UDPno connection
ldquobest effortrdquo service UDP segments may be
lost
establishment (which can add delay)simple no connection state
delivered out of order to app
c nn cti nl ss
simple no connection state at sender receiversmall segment header
connectionlessno handshaking between UDP sender receiver
no congestion control UDP can blast away as fast as desired
each UDP segment handled independently of others
Transport Layer 3-16
of others
UDP moreUDP moreoften used for streaming often used for streaming multimedia apps
loss tolerant source port dest port
32 bits
Length in f rate sensitive
other UDP usesDNS
length checksumbytes of UDPsegmentincluding
h dDNSSNMP
reliable transfer over UDP Application
header
reliable transfer over UDP add reliability at application layer
li ti s ifi
ppdata
(message)application-specific error recovery
UDP segment format
Transport Layer 3-17
UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted
segment
Sendertreat segment contents
f 16 bit
Receivercompute checksum of
i d tas sequence of 16-bit integerschecksum addition (1rsquos
received segmentcheck if computed checksum equals checksum field value(
complement sum) of segment contentssender puts checksum
qNO - error detectedYES - no error detected B t m b s sender puts checksum
value into UDP checksum field
But maybe errors nonetheless More later hellip
Transport Layer 3-18
Internet Checksum ExamplepNote
When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
wraparound
sumh k
Transport Layer 3-19
1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-20
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedReliable data transfer getting started
rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to
deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over
li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Connection-oriented demux (cont)
P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-13
Connection-oriented demux Threaded Web Server
P1 P1P2P4 P3P1 P2P4 P3
SP 5775DP 80D 8
D-IPCS-IP B
ClientIP B
clientIP A
serverSP 9157DP 80
SP 9157DP 80
S IP A S IP B IPBIP A IP CD-IPC
S-IP AD-IPC
S-IP B
Transport Layer 3-14
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-15
UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]
ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP
Why is there a UDPno connection
ldquobest effortrdquo service UDP segments may be
lost
establishment (which can add delay)simple no connection state
delivered out of order to app
c nn cti nl ss
simple no connection state at sender receiversmall segment header
connectionlessno handshaking between UDP sender receiver
no congestion control UDP can blast away as fast as desired
each UDP segment handled independently of others
Transport Layer 3-16
of others
UDP moreUDP moreoften used for streaming often used for streaming multimedia apps
loss tolerant source port dest port
32 bits
Length in f rate sensitive
other UDP usesDNS
length checksumbytes of UDPsegmentincluding
h dDNSSNMP
reliable transfer over UDP Application
header
reliable transfer over UDP add reliability at application layer
li ti s ifi
ppdata
(message)application-specific error recovery
UDP segment format
Transport Layer 3-17
UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted
segment
Sendertreat segment contents
f 16 bit
Receivercompute checksum of
i d tas sequence of 16-bit integerschecksum addition (1rsquos
received segmentcheck if computed checksum equals checksum field value(
complement sum) of segment contentssender puts checksum
qNO - error detectedYES - no error detected B t m b s sender puts checksum
value into UDP checksum field
But maybe errors nonetheless More later hellip
Transport Layer 3-18
Internet Checksum ExamplepNote
When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
wraparound
sumh k
Transport Layer 3-19
1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-20
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedReliable data transfer getting started
rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to
deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over
li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-15
UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]
ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP
Why is there a UDPno connection
ldquobest effortrdquo service UDP segments may be
lost
establishment (which can add delay)simple no connection state
delivered out of order to app
c nn cti nl ss
simple no connection state at sender receiversmall segment header
connectionlessno handshaking between UDP sender receiver
no congestion control UDP can blast away as fast as desired
each UDP segment handled independently of others
Transport Layer 3-16
of others
UDP moreUDP moreoften used for streaming often used for streaming multimedia apps
loss tolerant source port dest port
32 bits
Length in f rate sensitive
other UDP usesDNS
length checksumbytes of UDPsegmentincluding
h dDNSSNMP
reliable transfer over UDP Application
header
reliable transfer over UDP add reliability at application layer
li ti s ifi
ppdata
(message)application-specific error recovery
UDP segment format
Transport Layer 3-17
UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted
segment
Sendertreat segment contents
f 16 bit
Receivercompute checksum of
i d tas sequence of 16-bit integerschecksum addition (1rsquos
received segmentcheck if computed checksum equals checksum field value(
complement sum) of segment contentssender puts checksum
qNO - error detectedYES - no error detected B t m b s sender puts checksum
value into UDP checksum field
But maybe errors nonetheless More later hellip
Transport Layer 3-18
Internet Checksum ExamplepNote
When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
wraparound
sumh k
Transport Layer 3-19
1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-20
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedReliable data transfer getting started
rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to
deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over
li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
UDP moreUDP moreoften used for streaming often used for streaming multimedia apps
loss tolerant source port dest port
32 bits
Length in f rate sensitive
other UDP usesDNS
length checksumbytes of UDPsegmentincluding
h dDNSSNMP
reliable transfer over UDP Application
header
reliable transfer over UDP add reliability at application layer
li ti s ifi
ppdata
(message)application-specific error recovery
UDP segment format
Transport Layer 3-17
UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted
segment
Sendertreat segment contents
f 16 bit
Receivercompute checksum of
i d tas sequence of 16-bit integerschecksum addition (1rsquos
received segmentcheck if computed checksum equals checksum field value(
complement sum) of segment contentssender puts checksum
qNO - error detectedYES - no error detected B t m b s sender puts checksum
value into UDP checksum field
But maybe errors nonetheless More later hellip
Transport Layer 3-18
Internet Checksum ExamplepNote
When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
wraparound
sumh k
Transport Layer 3-19
1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-20
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedReliable data transfer getting started
rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to
deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over
li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Internet Checksum ExamplepNote
When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult
Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
wraparound
sumh k
Transport Layer 3-19
1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-20
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedReliable data transfer getting started
rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to
deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over
li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedReliable data transfer getting started
rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to
deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over
li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedReliable data transfer getting started
rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to
deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over
li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Reliable data transfer getting startedReliable data transfer getting startedWersquoll
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer
but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel
underlying channel perfectly reliabley g p yno bit errorsno loss of packets
separate FSMs for sender receiversender sends data into underlying channel
i d d t f d l i h lreceiver read data from underlying channel
Wait for call from above packet = make_pkt(data)
udt send(packet)
rdt_send(data)
extract (packetdata)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
udt_send(packet)
sender receiver
Transport Layer 3-26
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Rdt2 0 channel with bit errorsRdt20 channel with bit errors
underlying channel may flip bits in packety g y p pchecksum to detect bit errors
the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK
new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender
Transport Layer 3-27
rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowdΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
belowsender
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-28
udt_send(ACK)
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-29
udt_send(ACK)
rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)
rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
belowΛ
rdt_rcv(rcvpkt) ampamp t t( kt)
below
extract(rcvpktdata)deliver_data(data)udt send(ACK)
notcorrupt(rcvpkt)
Transport Layer 3-30
udt_send(ACK)
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
rdt2 0 has a fatal flawrdt20 has a fatal flaw
Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what
Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what
happened at receivercanrsquot just retransmit
pkt if ACKNAK garbledsender adds sequence number to each pkt
possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt
Sender sends one packet stop and wait
p then waits for receiver response
Transport Layer 3-31
rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs
sndpkt make pkt(0 data checks m)
rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ΛΛ
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-32
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)
dt d( d kt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)
sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)
ampamp has_seq1(rcvpkt)
extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)
Transport Layer 3-33
sndpkt make_pkt(ACK chksum)udt_send(sndpkt)
rdt2 1 discussionrdt21 discussion
d R iSenderseq added to pkt
Receivermust check if received
k t is d li ttwo seq rsquos (01) will suffice Why
t h k if i d
packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
or s p ct p t seq
note receiver can notk f l twice as many states
state must ldquorememberrdquo whether ldquocurrentrdquo pkt
know if its last ACKNAK received OK at senderp
has 0 or 1 seq at sender
Transport Layer 3-34
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
rdt2 2 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK
receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-35
rdt22 sender receiver fragmentsrdt22 sender receiver fragments
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
( corrupt(rcvpkt) ||isACK(rcvpkt1) )
Wait for ACKcall 0 from
above udt_send(sndpkt)
isACK(rcvpkt1) )
rdt rcv(rcvpkt)
ACK0
sender FSMfragment rdt_rcv(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
fragment
rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ
Wait for 0 from below
(corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)
receiver FSMfragment
Λ
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)
extract(rcvpktdata)
Transport Layer 3-36
deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
rdt3 0 channels with errors and lossrdt30 channels with errors and loss
N i h d i New assumptionunderlying channel can also lose packets (data
Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data
or ACKs)checksum seq ACKs
time for ACK retransmits if no ACK received in this time q
retransmissions will be of help but not enough
if pkt (or ACK) just delayed (not lost)
retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed
requires countdown timer
Transport Layer 3-37
requires countdown timer
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
dt d( d kt)
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
C ( 1) )udt_send(sndpkt)start_timer
Wait for
isACK(rcvpkt1) )
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
ΛΛ
for ACK0
rdt_rcv(rcvpkt) ampamp t t( kt)
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)
udt_send(sndpkt)start_timer
call 0from above
Wait for
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)
ampamp isACK(rcvpkt1)
stop_timerstop_timer
Wait Wait for call 1 from
above
rdt send(data)
udt_send(sndpkt)start_timer
timeout for ACK1
Λrdt_rcv(rcvpkt)
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer
rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )
Λ
Transport Layer 3-38
Λ
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
rdt3 0 in actionrdt30 in action
Transport Layer 3-39
rdt3 0 in actionrdt30 in action
Transport Layer 3-40
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Performance of rdt3 0Performance of rdt30
d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet
dsmicrosecon8bps10
bits80009
===R
Ldtrans
U sender utilization ndash fraction of time sender busy sending
bps10R
U sender =
008
30008 = 000027 L R
RTT + L R =
1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources
Transport Layer 3-41
rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver
first packet bit transmitted t = 0last packet bit transmitted t = L R
RTTfirst packet bit arriveslast packet bit arrives send ACK
ACK arrives send next packet t = RTT + L Rp
U sender =
008
30008 = 000027 L R
RTT + L R =
Transport Layer 3-42
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y
be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver
Two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-43
selective repeat
Pipelining increased utilizationPipelining increased utilization
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
Increase utilizationby a factor of 3
U sender =
024
30008 = 00008 3 L R
RTT + L R =
Transport Layer 3-44
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Pipelining ProtocolsPipelining Protocols
G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in
Selective Repeat big picSender can have up to N unacked packets in N unacked packets in
pipelineRcvr only sends
N unacked packets in pipelineRcvr acks individual y
cumulative acksDoesnrsquot ack packet if therersquos a gap
packetsSender maintains timer for each there s a gap
Sender has timer for oldest unacked packet
timer for each unacked packet
When timer expires If timer expires retransmit all unacked packets
pretransmit only unack packet
Transport Layer 3-45
Selective repeat big pictureSelective repeat big picture
d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet
When timer expires retransmit only unack When timer expires retransmit only unack packet
Transport Layer 3-46
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Go-Back-NGo-Back-NSender
k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)
timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window
Transport Layer 3-47
timeout(n) retransmit pkt n and all higher seq pkts in window
GBN sender extended FSMrdt_send(data)
if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)
udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++
elserefuse_data(data)
base=1
Λ
Wait start_timerudt_send(sndpkt[base])
timeout
base 1nextseqnum=1
udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])
rdt rcv(rcvpkt) ampamp
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)
base = getacknum(rcvpkt)+1If (base == nextseqnum)
t ti
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)
Transport Layer 3-48
stop_timerelsestart_timer
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
GBN receiver extended FSMGBN receiver extended FSM
udt_send(sndpkt)
default
rdt rcv(rcvpkt)
Wait
_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)
extract(rcvpktdata)expectedseqnum=1
Λ
deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++
sndpkt = make_pkt(expectedseqnumACKchksum)
ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq
may generate duplicate ACKsneed only remember expectedseqnumy
out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering
Transport Layer 3-49
Re-ACK pkt with highest in-order seq
GBN inGBN inaction
Transport Layer 3-50
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Selective RepeatSelective Repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts
buffers pkts as needed for eventual in-order delivery p yto upper layer
sender only resends pkts for which ACK not i dreceived
sender timer for each unACKed pkts nd indsender window
N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts
Transport Layer 3-51
Selective repeat sender receiver windowsSelective repeat sender receiver windows
Transport Layer 3-52
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Selective repeatS ct r p at
d f b sender
kt i receiver
data from above if next available seq in window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n)out of order bufferwindow send pkt
timeout(n)resend pkt n restart timer
out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r
ACK(n) in [sendbasesendbase+N]
mark pkt n as received
pkts) advance window to next not-yet-received pkt
pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq
pkt n in [rcvbase-Nrcvbase-1]
ACK(n)otherwisenext unACKed seq otherwise
ignore
Transport Layer 3-53
Selective repeat in actionp
Transport Layer 3-54
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Selective repeatdilemma
Example Example seq rsquos 0 1 2 3window size=3window size 3
receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)
Q what relationship between seq size
Transport Layer 3-55
between seq size and window size
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-56
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581
f ll d l d ti t t i t full duplex databi-directional data flow in same connection
point-to-pointone sender one receiver
reliable in order byte n same connect onMSS maximum segment size
reliable in-order byte steam
no ldquomessage boundariesrdquoconnection-oriented
handshaking (exchange of control msgs) initrsquos
no message boundar espipelined
TCP congestion and flow of control msgs) init s sender receiver state before data exchange
gcontrol set window size
send amp receive buffersflow controlled
sender will not overwhelm receiver
socketdoor
TCP TCP
socketdoor
applicationwrites data
applicationreads data
Transport Layer 3-57
overwhelm receiverTCPsend buffer
TCPreceive buffer
segment
TCP segment structure
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
Receive windowUrg data pnterchecksum
FSRPAUheadlen
notused
validPSH push data now(generally not used) bytes
illin
(not segments)
Urg data pnter
Options (variable length)RST SYN FINconnection estab( d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-58
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B
byte stream ldquonumberrdquo of first byte in segmentrsquos
Host A Host B
Usertypes
lsquoCrsquoy gmdata
ACKs f t b t
Chost ACKsreceipt oflsquoCrsquo echoes
b k lsquoCrsquoseq of next byte expected from other side host ACKs
back lsquoCrsquo
cumulative ACKQ how receiver handles
out of order segments
host ACKsreceipt
of echoedlsquoCrsquo
out-of-order segmentsA TCP spec doesnrsquot say - up to time
simple telnet scenario
Transport Layer 3-59
implementor simple telnet scenario
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT
QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies
too short premature timeout
receiptignore retransmissions
SampleRTT will vary want timeoutunnecessary retransmissions
p yestimated RTT ldquosmootherrdquo
average several recent measurements not just too long slow reaction
to segment lossmeasurements not just current SampleRTT
Transport Layer 3-60
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT
Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125
Transport Layer 3-61
Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr
350
300
350
250
onds
)
200RTT
(mill
isec
o
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
Transport Layer 3-62
SampleRTT Estimated RTT
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout
Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo
large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT
DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|
(typically β = 025)
Then set timeout interval
TimeoutInterval = EstimatedRTT + 4DevRTT
Then set timeout interval
Transport Layer 3-63
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-64
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP reliable data transferTCP reliable data transfer
TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service
Retransmissions are triggered by
timeout eventsunreliable servicePipelined segmentsCumulative acks
timeout eventsduplicate acks
Initially consider Cumulative acksTCP uses single retransmission timer
Initially consider simplified TCP sender
ignore duplicate acksretransmission timer pignore flow control congestion control
Transport Layer 3-65
TCP sender eventsdata rcvd from app
Create segment with timeout
retransmit segment Create segment with seq seq is byte-stream
retransmit segment that caused timeoutrestart timerseq is byte-stream
number of first data byte in segment
restart timerAck rcvd
If acknowledges y gstart timer if not already running (think
If acknowledges previously unacked segments
of timer as for oldest unacked segment)
i i i l
gupdate what is known to be ackedt t ti if th expiration interval
TimeOutInterval start timer if there are outstanding segments
Transport Layer 3-66
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum
TCP sender
loop (forever) switch(event)
d i d f li i b
(simplified)event data received from application above
create TCP segment with sequence number NextSeqNum if (timer currently not running)
start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)
t ti ti t
Commentbull SendBase-1 last cumulatively event timer timeout
retransmit not-yet-acknowledged segment with smallest sequence number
start timer
cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer
event ACK received with ACK field value of y if (y gt SendBase)
S dB
SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer
y that new data is acked
Transport Layer 3-67
end of loop forever
TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B
eout
eq=9
2 ti
me
l
tim
eout
X
Seloss
ut
Sendbase= 100
=92
tim
eou
SendBase= 120
100
premature timeout
Seq
SendBase= 100 SendBase
= 120
Transport Layer 3-68
timep
lost ACK scenariotime
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B
l
tim
eout
Xloss
SendBase = 120
Cumulative ACK scenariotime
Transport Layer 3-69
TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]
E t t R i TCP R i tiEvent at Receiver
Arrival of in-order segment with
TCP Receiver action
Delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
A i l f i d t ith
for next segment If no next segmentsend ACK
Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending
Immediately send single cumulative ACK ACKing both in-order segments
Arrival of out-of-order segmenthigher-than-expect seq Gap detected
Immediately send duplicate ACK indicating seq of next expected byte
Gap detected
Arrival of segment that partially or completely fills gap
Immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-70
partially or completely fills gap segment starts at lower end of gap
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Fast RetransmitFast Retransmit
Ti i d f If d i 3 Time-out period often relatively long
long delay before
If sender receives 3 ACKs for the same data it supposes that long delay before
resending lost packetDetect lost segments
data it supposes that segment after ACKed data was lostgm
via duplicate ACKsSender often sends
t b k t
fast retransmit resend segment before timer expiresmany segments back-to-
backIf segment is lost
expires
f gm there will likely be many duplicate ACKs
Transport Layer 3-71
Host A Host B
X
tim
eout
t
time
Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Fast retransmit algorithmFast retransmit algorithm
event ACK received with ACK field value of y if (y gt SendBase)
SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)
start timer
else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )
resend segment with sequence number y
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-73
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-74
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP Flow ControlTCP Flow Control
i id f TCP sender wonrsquot overflowflow control
receive side of TCP connection has a receive buffer
sender won t overflowreceiverrsquos buffer by
transmitting too muchtoo fastreceive buffer
d t hi
too fast
speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be
slow at reading from buffer
Transport Layer 3-75
buffer
TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in
(S TCP i
segmentsSender limits unACKed
(Suppose TCP receiver discards out-of-order segments)
data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)
spare room in buffer= RcvWindow
buffer doesn t overflow
RcvWindow
= RcvBuffer-[LastByteRcvd -LastByteRead]
Transport Layer 3-76
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-77
TCP Connection ManagementRecall TCP sender receiver
establish ldquoconnectionrdquo b f h i d t
Three way handshakeStep 1 client host sends TCP before exchanging data
segmentsinitialize TCP variables
Step 1 client host sends TCP SYN segment to server
specifies initial seq seq sbuffers flow control i f ( R Wi d )
pno data
Step 2 server host receives YN li i h YNACK info (eg RcvWindow)
client connection initiatorSocket clientSocket = new
SYN replies with SYNACK segment
server allocates buffersSocket(hostnameport
number)
server contacted by client
server allocates buffersspecifies server initial seq
server contacted by clientSocket connectionSocket = welcomeSocketaccept()
Step 3 client receives SYNACK replies with ACK segment which may contain data
Transport Layer 3-78
m y
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP C ti M t ( t )TCP Connection Management (cont)
Closing a connection
client closes socket
client server
closeclient closes socketclientSocketclose()
Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server
close
Step 2 server receives FIN replies with ACK ed
wai
t
pCloses connection sends FIN closed
tim
e
Transport Layer 3-79
TCP C ti M t ( t )TCP Connection Management (cont)
Step 3 client receives FIN replies with ACK
client server
closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs
Step 4 server receives ACK Connection closed
closing
ACK Connection closed
Note with small modification can handle ed
wai
t
closedmodification can handle simultaneous FINs
closed
tim
e
Transport Layer 3-80
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP Connection Management (cont)TCP Connection Management (cont)
TCP serverlifecycle
TCP clientlifecyclelifecycle
Transport Layer 3-81
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-82
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Principles of Congestion ControlPrinciples of Congestion Control
Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations
lost packets (buffer overflow at routers)long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-83
Causescosts of congestion scenario 1Causescosts of congestion scenario 1
two senders two Host A
λin original dataλouttwo senders two
receiversone router
unlimited shared output link buffers
Host Bone router infinite buffers no retransmission
l d l large delays when congested
i maximum achievable throughput
Transport Layer 3-84
throughput
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Causescosts of congestion scenario 2Causescosts of congestion scenario 2
one router finite buffers one router finite buffers sender retransmission of lost packet
Host A λin original data
λout
finite shared output Host B
λin original data plus retransmitted data
link buffers
Transport Layer 3-85
Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss
λin
λout
=
λ λgtp f m y
retransmission of delayed (not lost) packet makes larger (than perfect case) for same
λin
λout
gtλ
inλ( p f ) f λ
outR2R2 R2
λ out
λ out
λ out
R4
R3
R2λin
b
R2λin
a
R2λin
c
ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo
ba c
Transport Layer 3-86
punneeded retransmissions link carries multiple copies of pkt
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ
inQ what happens as
multihop pathstimeoutretransmit
inQ pp
and increase λin
Host Aλin original data
λout
λin original data plus retransmitted data
finite shared output link buffers
Host B
Transport Layer 3-87
Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho
λost A
H
ou
t
Host B
Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission
capacity used for that packet was wasted
Transport Layer 3-88
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Approaches towards congestion controlApproaches towards congestion control
Two broad approaches towards congestion control
End-end congestion Network-assisted
Two broad approaches towards congestion control
gcontrolno explicit feedback from network
congestion controlrouters provide feedback to end systemsnetwork
congestion inferred from end-system observed loss
to end systemssingle bit indicating congestion (SNA
delayapproach taken by TCP
DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at
Transport Layer 3-89
Case study ATM ABR congestion controlCase study ATM ABR congestion control
BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path
RM (resource management) cellssent by sender interspersed if sender s path
ldquounderloadedrdquo sender should use
sent by sender interspersed with data cellsbits in RM cell set by switches
available bandwidthif senderrsquos path congested
(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested
sender throttled to minimum guaranteed
t
(mild congestion)CI bit congestion indication
rate RM cells returned to sender by receiver with bits intact
Transport Layer 3-90
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Case study ATM ABR congestion controlCase study ATM ABR congestion control
two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path
llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell
Transport Layer 3-91
bit in returned RM cell
Chapter 3 outlineChapter 3 outline
3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d
35 Connection-oriented transport TCP
segment structure32 Multiplexing and demultiplexing3 3 Connectionless
segment structurereliable data transferflow control33 Connectionless
transport UDP3 4 Principles of
fconnection management
36 Principles of 34 Principles of reliable data transfer
pcongestion control37 TCP congestion
lcontrol
Transport Layer 3-92
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP congestion control additive increase gmultiplicative decrease
Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs
additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after
24 Kbytes
congestionwindow
loss w
siz
e
16 Kbytes
on w
indo
w
Saw toothbehavior probing
for bandwidth8 Kbytes
titimeco
nges
tiofor bandwidth
Transport Layer 3-93
timetime
TCP Congestion Control detailsTCP Congestion Control details
d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked
le CongWin
How does sender perceive congestionl ss t ti t le CongWin
Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi
CongWin is dynamic function
TCP sender reduces rate (CongWin) after loss event
rate = CongWinRTT Bytessec
CongWin is dynamic function of perceived network congestion
nthree mechanisms
AIMDcongestionslow startconservative after time ut events
Transport Layer 3-94
timeout events
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP Slow StartTCP Slow Start
Wh i b i Wh i b i When connection begins CongWin = 1 MSS
Example MSS = 500
When connection begins increase rate exponentially fast until Example MSS = 500
bytes amp RTT = 200 msecinitial rate = 20 kbps
exponentially fast until first loss event
available bandwidth may be gtgt MSSRTT
desirable to quickly ramp up to respectable rate
Transport Layer 3-95
TCP Slow Start (more)TCP Slow Start (more)
Wh i When connection begins increase rate exponentially until
Host A
T
Host B
exponentially until first loss event
double CongWin every
RTT
g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received
Summary initial rate yis slow but ramps up exponentially fast time
Transport Layer 3-96
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs
CongWin is cut in halfi d th s
Philosophywindow then grows linearly
But after timeout event3 dup ACKs indicates
network capable of But after timeout eventCongWin instead set to 1 MSS
n tw r capa f delivering some segments
timeout indicates a ldquo l rdquo 1 MSS
window then grows exponentially
ldquomore alarmingrdquo congestion scenario
exponentiallyto a threshold then grows linearly
Transport Layer 3-97
g y
RefinementQ When should the
exponential exponential increase switch to linear
A When CongWingets to 12 of its value before timeout
ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event
Transport Layer 3-98
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Summary TCP Congestion Control
When CongWin is below Threshold sender in slow-start phase window grows exponentially
When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly
When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold
When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS
Transport Layer 3-99
TCP sender congestion controlgState Event TCP Sender Action Commentary
Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)
ACK receipt for previously unacked data
CongWin = CongWin + MSS If (CongWin gt Threshold)
set state to ldquoCongestion Avoidance
Resulting in a doubling of CongWin every RTT
CongestionAvoidance (CA)
ACK receipt for previously unacked data
CongWin = CongWin+MSS (MSSCongWin)
Additive increase resulting in increase of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate
Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance
Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate
ACKAvoidance drop below 1 MSS
SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS
Enter slow start
Set state to ldquoSlow Start
SS or CA Duplicate ACK
Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
Transport Layer 3-100
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP throughputTCP throughput
h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT
Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT
Transport Layer 3-101
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate
MSSsdot221
L = 210-10 WowLRTT
New versions of TCP for high-speed
Transport Layer 3-102
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
TCP FairnessFairness goal if K TCP sessions share same
TCP FairnessFairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it R
TCP connection 2 capacity Rconnect on
Transport Layer 3-103
Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions
Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally
R equal bandwidth share
i id ddi i iloss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC nn ti n 1 th hp t
Transport Layer 3-104
RConnection 1 throughput
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP
Fairness (more)Fairness and UDP
M lti di ft Fairness and parallel TCP
connectionsMultimedia apps often do not use TCP
do not want rate
connectionsnothing prevents app from opening parallel do not want rate
throttled by congestion control
Instead use UDP
p g pconnections between 2 hostsWeb browsers do this Instead use UDP
pump audiovideo at constant rate tolerate
k t l
Web browsers do this Example link of rate R supporting 9 connections packet loss
Research area TCP friendly
supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2
Transport Layer 3-105
Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services
multiplexing p gdemultiplexingreliable data transferflow controlcongestion control
Nextleaving the network
instantiation and implementation in the I
gldquoedgerdquo (application transport layers)
InternetUDP
P
into the network ldquocorerdquo
Transport Layer 3-106
TCP