ch 3chapter 3 transport layer

53
Ch 3 Chapter 3 Transport Layer Transport Layer Computer Networking: A T D A h A note on the use of these ppt slides: We’re making these slides freely available to all (faculty, students, readers). They’re in PowerPoint form so you can add, modify, and delete slides (including this one) and slide content to suit your needs. They obviously A Top Down Approach 5 th edition. Jim Kurose, Keith Ross Addis W sl A il represent a lot of work on our part. In return for use, we only ask the following: If you use these slides (e.g., in a class) in substantially unaltered form, that you mention their source (after all, we’d like people to use our book!) If you post any slides in substantially unaltered form on a www site that Addison-Wesley, April 2009. If you post any slides in substantially unaltered form on a www site, that you note that they are adapted from (or perhaps identical to) our slides, and note our copyright of this material. Thanks and enjoy! JFK/KWR Transport Layer 3-1 All material copyright 1996-2009 J.F Kurose and K.W. Ross, All Rights Reserved Chapter 3: Transport Layer Chapter 3: Transport Layer Our goals: understand principles behind transport learn about transport layer protocols in the layer services: multiplexing/demultipl xin Internet: UDP: connectionless transport exing reliable data transfer flow control transport TCP: connection-oriented transport flow control congestion control TCP congestion control Transport Layer 3-2

Upload: others

Post on 13-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ch 3Chapter 3 Transport Layer

Ch 3Chapter 3Transport LayerTransport Layer

Computer Networking A T D A h

A note on the use of these ppt slidesWersquore making these slides freely available to all (faculty students readers) Theyrsquore in PowerPoint form so you can add modify and delete slides (including this one) and slide content to suit your needs They obviously

A Top Down Approach 5th edition Jim Kurose Keith RossAddis W sl A il

( g ) y y yrepresent a lot of work on our part In return for use we only ask the following

If you use these slides (eg in a class) in substantially unaltered form that you mention their source (after all wersquod like people to use our book)

If you post any slides in substantially unaltered form on a www site that Addison-Wesley April 2009

If you post any slides in substantially unaltered form on a www site that you note that they are adapted from (or perhaps identical to) our slides and note our copyright of this material

Thanks and enjoy JFKKWR

Transport Layer 3-1

All material copyright 1996-2009JF Kurose and KW Ross All Rights Reserved

Chapter 3 Transport LayerChapter 3 Transport LayerOur goalsOur goa s

understand principles behind transport

learn about transport layer protocols in the p

layer servicesmultiplexingdemultiplxin

y pInternet

UDP connectionless transportexing

reliable data transferflow control

transportTCP connection-oriented transportflow control

congestion control TCP congestion control

Transport Layer 3-2

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-3

Transport services and protocolsTransport services and protocolsprovide logical communication

applicationtransportnetworkd li k

p gbetween app processes running on different hoststransport protocols run in

data linkphysical

transport protocols run in end systems

send side breaks app messages into segments passes to network layerrcv side reassembles applicationrcv side reassembles segments into messages passes to app layer

th t s t

pptransportnetworkdata linkphysical

more than one transport protocol available to apps

Internet TCP and UDP

Transport Layer 3-4

Transport vs network layerTransport vs network layer

k l l i l Household analogynetwork layer logical communication between hosts

Household analogy12 kids sending letters to

12 kidsbetween hoststransport layer logical communication

12 kidsprocesses = kidsapp messages = letters communication

between processes relies on enhances

app messages = letters in envelopeshosts = housesnetwork layer services hosts = housestransport protocol = Ann and BillAnn and Billnetwork-layer protocol = postal service

Transport Layer 3-5

p

Internet transport-layer protocolsp y p

reliable in-order applicationtransportreliable in order

delivery (TCP)congestion control

transportnetworkdata linkphysical

networkdata linkphysical network

flow controlconnection setup

li bl d d

physical

n t k

data linkphysical

unreliable unordered delivery UDP

no frills extension of network

networkdata linkphysical

networkdata linkphysical

no-frills extension of ldquobest-effortrdquo IP

services not available networkdata linkphysical

networkdata linkphysical application

transportnetworkdata linkphysical

delay guaranteesbandwidth guarantees

physical

Transport Layer 3-6

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-7

MultiplexingdemultiplexingMultiplexingdemultiplexingDemultiplexing at rcv host

th i d t f lti lMultiplexing at send host

delivering received segmentsto correct socket

gathering data from multiplesockets enveloping data with header (later used for

= process= socketdemultiplexing)

application

transport

P1 application

transport

application

transport

P2P3 P4P1

transport

network

link

network

link

transport

network

linklink

physical

link

physical

link

physical

host 3

Transport Layer 3-8

host 1 host 2 host 3

How demultiplexing worksHow demultiplexing workshost receives IP datagrams

h d t h each datagram has source IP address destination IP address source port dest port

32 bits

each datagram carries 1 transport-layer segmenteach segment has source

other header fieldseach segment has source destination port number

host uses IP addresses amp port applicationnumbers to direct segment to appropriate socket

ppdata

(message)

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexingConnectionless demultiplexing

C k i h When host receives UDP Create sockets with port numbers

DatagramSocket mySocket1 = new

When host receives UDP segment

checks destination port DatagramSocket mySocket1 = new DatagramSocket(12534)

DatagramSocket mySocket2 = new DatagramSocket(12535)

pnumber in segmentdirects UDP segment to socket with that port DatagramSocket(12535)

UDP socket identified by two-tuple

socket with that port number

IP datagrams with two tuple(dest IP address dest port number)

gdifferent source IP addresses andor source

t b di t d port numbers directed to same socket

Transport Layer 3-10

Connectionless demux (cont)Connectionless demux (cont)k k k (6428)DatagramSocket serverSocket = new DatagramSocket(6428)

P2 P1P1P3P2 P1P3

SP 6428DP 9157

SP 6428DP 5775

ClientIP B

clientIP A

serverSP 9157DP 6428

SP 5775DP 6428

IPBIP A IP C

SP provides ldquoreturn addressrdquo

Transport Layer 3-11

SP provides return address

Connection-oriented demuxConnection-oriented demux

TCP k id ifi d h TCP socket identified by 4-tuple

source IP address

Server host may support many simultaneous TCP socketssource IP address

source port numberdest IP address

socketseach socket identified by its own 4-tuple

dest port numberrecv host uses all four

Web servers have different sockets for

h i livalues to direct segment to appropriate s ck t

each connecting clientnon-persistent HTTP will have different socket for socket have different socket for each request

Transport Layer 3-12

Connection-oriented demux (cont)

P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-13

Connection-oriented demux Threaded Web Server

P1 P1P2P4 P3P1 P2P4 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-14

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]

ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP

Why is there a UDPno connection

ldquobest effortrdquo service UDP segments may be

lost

establishment (which can add delay)simple no connection state

delivered out of order to app

c nn cti nl ss

simple no connection state at sender receiversmall segment header

connectionlessno handshaking between UDP sender receiver

no congestion control UDP can blast away as fast as desired

each UDP segment handled independently of others

Transport Layer 3-16

of others

UDP moreUDP moreoften used for streaming often used for streaming multimedia apps

loss tolerant source port dest port

32 bits

Length in f rate sensitive

other UDP usesDNS

length checksumbytes of UDPsegmentincluding

h dDNSSNMP

reliable transfer over UDP Application

header

reliable transfer over UDP add reliability at application layer

li ti s ifi

ppdata

(message)application-specific error recovery

UDP segment format

Transport Layer 3-17

UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted

segment

Sendertreat segment contents

f 16 bit

Receivercompute checksum of

i d tas sequence of 16-bit integerschecksum addition (1rsquos

received segmentcheck if computed checksum equals checksum field value(

complement sum) of segment contentssender puts checksum

qNO - error detectedYES - no error detected B t m b s sender puts checksum

value into UDP checksum field

But maybe errors nonetheless More later hellip

Transport Layer 3-18

Internet Checksum ExamplepNote

When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0

wraparound

sumh k

Transport Layer 3-19

1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-20

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-21

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-22

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-23

complexity of reliable data transfer protocol (rdt)

Reliable data transfer getting startedReliable data transfer getting started

rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to

deliver to receiver upper layer

deliver_data() called by rdt to deliver data to upper

send receivesendside

receiveside

udt_send() called by rdtto transfer packet over

li bl h l t i

rdt_rcv() called when packet arrives on rcv-side of channel

Transport Layer 3-24

unreliable channel to receiver

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 2: Ch 3Chapter 3 Transport Layer

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-3

Transport services and protocolsTransport services and protocolsprovide logical communication

applicationtransportnetworkd li k

p gbetween app processes running on different hoststransport protocols run in

data linkphysical

transport protocols run in end systems

send side breaks app messages into segments passes to network layerrcv side reassembles applicationrcv side reassembles segments into messages passes to app layer

th t s t

pptransportnetworkdata linkphysical

more than one transport protocol available to apps

Internet TCP and UDP

Transport Layer 3-4

Transport vs network layerTransport vs network layer

k l l i l Household analogynetwork layer logical communication between hosts

Household analogy12 kids sending letters to

12 kidsbetween hoststransport layer logical communication

12 kidsprocesses = kidsapp messages = letters communication

between processes relies on enhances

app messages = letters in envelopeshosts = housesnetwork layer services hosts = housestransport protocol = Ann and BillAnn and Billnetwork-layer protocol = postal service

Transport Layer 3-5

p

Internet transport-layer protocolsp y p

reliable in-order applicationtransportreliable in order

delivery (TCP)congestion control

transportnetworkdata linkphysical

networkdata linkphysical network

flow controlconnection setup

li bl d d

physical

n t k

data linkphysical

unreliable unordered delivery UDP

no frills extension of network

networkdata linkphysical

networkdata linkphysical

no-frills extension of ldquobest-effortrdquo IP

services not available networkdata linkphysical

networkdata linkphysical application

transportnetworkdata linkphysical

delay guaranteesbandwidth guarantees

physical

Transport Layer 3-6

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-7

MultiplexingdemultiplexingMultiplexingdemultiplexingDemultiplexing at rcv host

th i d t f lti lMultiplexing at send host

delivering received segmentsto correct socket

gathering data from multiplesockets enveloping data with header (later used for

= process= socketdemultiplexing)

application

transport

P1 application

transport

application

transport

P2P3 P4P1

transport

network

link

network

link

transport

network

linklink

physical

link

physical

link

physical

host 3

Transport Layer 3-8

host 1 host 2 host 3

How demultiplexing worksHow demultiplexing workshost receives IP datagrams

h d t h each datagram has source IP address destination IP address source port dest port

32 bits

each datagram carries 1 transport-layer segmenteach segment has source

other header fieldseach segment has source destination port number

host uses IP addresses amp port applicationnumbers to direct segment to appropriate socket

ppdata

(message)

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexingConnectionless demultiplexing

C k i h When host receives UDP Create sockets with port numbers

DatagramSocket mySocket1 = new

When host receives UDP segment

checks destination port DatagramSocket mySocket1 = new DatagramSocket(12534)

DatagramSocket mySocket2 = new DatagramSocket(12535)

pnumber in segmentdirects UDP segment to socket with that port DatagramSocket(12535)

UDP socket identified by two-tuple

socket with that port number

IP datagrams with two tuple(dest IP address dest port number)

gdifferent source IP addresses andor source

t b di t d port numbers directed to same socket

Transport Layer 3-10

Connectionless demux (cont)Connectionless demux (cont)k k k (6428)DatagramSocket serverSocket = new DatagramSocket(6428)

P2 P1P1P3P2 P1P3

SP 6428DP 9157

SP 6428DP 5775

ClientIP B

clientIP A

serverSP 9157DP 6428

SP 5775DP 6428

IPBIP A IP C

SP provides ldquoreturn addressrdquo

Transport Layer 3-11

SP provides return address

Connection-oriented demuxConnection-oriented demux

TCP k id ifi d h TCP socket identified by 4-tuple

source IP address

Server host may support many simultaneous TCP socketssource IP address

source port numberdest IP address

socketseach socket identified by its own 4-tuple

dest port numberrecv host uses all four

Web servers have different sockets for

h i livalues to direct segment to appropriate s ck t

each connecting clientnon-persistent HTTP will have different socket for socket have different socket for each request

Transport Layer 3-12

Connection-oriented demux (cont)

P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-13

Connection-oriented demux Threaded Web Server

P1 P1P2P4 P3P1 P2P4 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-14

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]

ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP

Why is there a UDPno connection

ldquobest effortrdquo service UDP segments may be

lost

establishment (which can add delay)simple no connection state

delivered out of order to app

c nn cti nl ss

simple no connection state at sender receiversmall segment header

connectionlessno handshaking between UDP sender receiver

no congestion control UDP can blast away as fast as desired

each UDP segment handled independently of others

Transport Layer 3-16

of others

UDP moreUDP moreoften used for streaming often used for streaming multimedia apps

loss tolerant source port dest port

32 bits

Length in f rate sensitive

other UDP usesDNS

length checksumbytes of UDPsegmentincluding

h dDNSSNMP

reliable transfer over UDP Application

header

reliable transfer over UDP add reliability at application layer

li ti s ifi

ppdata

(message)application-specific error recovery

UDP segment format

Transport Layer 3-17

UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted

segment

Sendertreat segment contents

f 16 bit

Receivercompute checksum of

i d tas sequence of 16-bit integerschecksum addition (1rsquos

received segmentcheck if computed checksum equals checksum field value(

complement sum) of segment contentssender puts checksum

qNO - error detectedYES - no error detected B t m b s sender puts checksum

value into UDP checksum field

But maybe errors nonetheless More later hellip

Transport Layer 3-18

Internet Checksum ExamplepNote

When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0

wraparound

sumh k

Transport Layer 3-19

1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-20

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-21

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-22

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-23

complexity of reliable data transfer protocol (rdt)

Reliable data transfer getting startedReliable data transfer getting started

rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to

deliver to receiver upper layer

deliver_data() called by rdt to deliver data to upper

send receivesendside

receiveside

udt_send() called by rdtto transfer packet over

li bl h l t i

rdt_rcv() called when packet arrives on rcv-side of channel

Transport Layer 3-24

unreliable channel to receiver

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 3: Ch 3Chapter 3 Transport Layer

Transport vs network layerTransport vs network layer

k l l i l Household analogynetwork layer logical communication between hosts

Household analogy12 kids sending letters to

12 kidsbetween hoststransport layer logical communication

12 kidsprocesses = kidsapp messages = letters communication

between processes relies on enhances

app messages = letters in envelopeshosts = housesnetwork layer services hosts = housestransport protocol = Ann and BillAnn and Billnetwork-layer protocol = postal service

Transport Layer 3-5

p

Internet transport-layer protocolsp y p

reliable in-order applicationtransportreliable in order

delivery (TCP)congestion control

transportnetworkdata linkphysical

networkdata linkphysical network

flow controlconnection setup

li bl d d

physical

n t k

data linkphysical

unreliable unordered delivery UDP

no frills extension of network

networkdata linkphysical

networkdata linkphysical

no-frills extension of ldquobest-effortrdquo IP

services not available networkdata linkphysical

networkdata linkphysical application

transportnetworkdata linkphysical

delay guaranteesbandwidth guarantees

physical

Transport Layer 3-6

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-7

MultiplexingdemultiplexingMultiplexingdemultiplexingDemultiplexing at rcv host

th i d t f lti lMultiplexing at send host

delivering received segmentsto correct socket

gathering data from multiplesockets enveloping data with header (later used for

= process= socketdemultiplexing)

application

transport

P1 application

transport

application

transport

P2P3 P4P1

transport

network

link

network

link

transport

network

linklink

physical

link

physical

link

physical

host 3

Transport Layer 3-8

host 1 host 2 host 3

How demultiplexing worksHow demultiplexing workshost receives IP datagrams

h d t h each datagram has source IP address destination IP address source port dest port

32 bits

each datagram carries 1 transport-layer segmenteach segment has source

other header fieldseach segment has source destination port number

host uses IP addresses amp port applicationnumbers to direct segment to appropriate socket

ppdata

(message)

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexingConnectionless demultiplexing

C k i h When host receives UDP Create sockets with port numbers

DatagramSocket mySocket1 = new

When host receives UDP segment

checks destination port DatagramSocket mySocket1 = new DatagramSocket(12534)

DatagramSocket mySocket2 = new DatagramSocket(12535)

pnumber in segmentdirects UDP segment to socket with that port DatagramSocket(12535)

UDP socket identified by two-tuple

socket with that port number

IP datagrams with two tuple(dest IP address dest port number)

gdifferent source IP addresses andor source

t b di t d port numbers directed to same socket

Transport Layer 3-10

Connectionless demux (cont)Connectionless demux (cont)k k k (6428)DatagramSocket serverSocket = new DatagramSocket(6428)

P2 P1P1P3P2 P1P3

SP 6428DP 9157

SP 6428DP 5775

ClientIP B

clientIP A

serverSP 9157DP 6428

SP 5775DP 6428

IPBIP A IP C

SP provides ldquoreturn addressrdquo

Transport Layer 3-11

SP provides return address

Connection-oriented demuxConnection-oriented demux

TCP k id ifi d h TCP socket identified by 4-tuple

source IP address

Server host may support many simultaneous TCP socketssource IP address

source port numberdest IP address

socketseach socket identified by its own 4-tuple

dest port numberrecv host uses all four

Web servers have different sockets for

h i livalues to direct segment to appropriate s ck t

each connecting clientnon-persistent HTTP will have different socket for socket have different socket for each request

Transport Layer 3-12

Connection-oriented demux (cont)

P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-13

Connection-oriented demux Threaded Web Server

P1 P1P2P4 P3P1 P2P4 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-14

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]

ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP

Why is there a UDPno connection

ldquobest effortrdquo service UDP segments may be

lost

establishment (which can add delay)simple no connection state

delivered out of order to app

c nn cti nl ss

simple no connection state at sender receiversmall segment header

connectionlessno handshaking between UDP sender receiver

no congestion control UDP can blast away as fast as desired

each UDP segment handled independently of others

Transport Layer 3-16

of others

UDP moreUDP moreoften used for streaming often used for streaming multimedia apps

loss tolerant source port dest port

32 bits

Length in f rate sensitive

other UDP usesDNS

length checksumbytes of UDPsegmentincluding

h dDNSSNMP

reliable transfer over UDP Application

header

reliable transfer over UDP add reliability at application layer

li ti s ifi

ppdata

(message)application-specific error recovery

UDP segment format

Transport Layer 3-17

UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted

segment

Sendertreat segment contents

f 16 bit

Receivercompute checksum of

i d tas sequence of 16-bit integerschecksum addition (1rsquos

received segmentcheck if computed checksum equals checksum field value(

complement sum) of segment contentssender puts checksum

qNO - error detectedYES - no error detected B t m b s sender puts checksum

value into UDP checksum field

But maybe errors nonetheless More later hellip

Transport Layer 3-18

Internet Checksum ExamplepNote

When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0

wraparound

sumh k

Transport Layer 3-19

1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-20

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-21

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-22

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-23

complexity of reliable data transfer protocol (rdt)

Reliable data transfer getting startedReliable data transfer getting started

rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to

deliver to receiver upper layer

deliver_data() called by rdt to deliver data to upper

send receivesendside

receiveside

udt_send() called by rdtto transfer packet over

li bl h l t i

rdt_rcv() called when packet arrives on rcv-side of channel

Transport Layer 3-24

unreliable channel to receiver

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 4: Ch 3Chapter 3 Transport Layer

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-7

MultiplexingdemultiplexingMultiplexingdemultiplexingDemultiplexing at rcv host

th i d t f lti lMultiplexing at send host

delivering received segmentsto correct socket

gathering data from multiplesockets enveloping data with header (later used for

= process= socketdemultiplexing)

application

transport

P1 application

transport

application

transport

P2P3 P4P1

transport

network

link

network

link

transport

network

linklink

physical

link

physical

link

physical

host 3

Transport Layer 3-8

host 1 host 2 host 3

How demultiplexing worksHow demultiplexing workshost receives IP datagrams

h d t h each datagram has source IP address destination IP address source port dest port

32 bits

each datagram carries 1 transport-layer segmenteach segment has source

other header fieldseach segment has source destination port number

host uses IP addresses amp port applicationnumbers to direct segment to appropriate socket

ppdata

(message)

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexingConnectionless demultiplexing

C k i h When host receives UDP Create sockets with port numbers

DatagramSocket mySocket1 = new

When host receives UDP segment

checks destination port DatagramSocket mySocket1 = new DatagramSocket(12534)

DatagramSocket mySocket2 = new DatagramSocket(12535)

pnumber in segmentdirects UDP segment to socket with that port DatagramSocket(12535)

UDP socket identified by two-tuple

socket with that port number

IP datagrams with two tuple(dest IP address dest port number)

gdifferent source IP addresses andor source

t b di t d port numbers directed to same socket

Transport Layer 3-10

Connectionless demux (cont)Connectionless demux (cont)k k k (6428)DatagramSocket serverSocket = new DatagramSocket(6428)

P2 P1P1P3P2 P1P3

SP 6428DP 9157

SP 6428DP 5775

ClientIP B

clientIP A

serverSP 9157DP 6428

SP 5775DP 6428

IPBIP A IP C

SP provides ldquoreturn addressrdquo

Transport Layer 3-11

SP provides return address

Connection-oriented demuxConnection-oriented demux

TCP k id ifi d h TCP socket identified by 4-tuple

source IP address

Server host may support many simultaneous TCP socketssource IP address

source port numberdest IP address

socketseach socket identified by its own 4-tuple

dest port numberrecv host uses all four

Web servers have different sockets for

h i livalues to direct segment to appropriate s ck t

each connecting clientnon-persistent HTTP will have different socket for socket have different socket for each request

Transport Layer 3-12

Connection-oriented demux (cont)

P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-13

Connection-oriented demux Threaded Web Server

P1 P1P2P4 P3P1 P2P4 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-14

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]

ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP

Why is there a UDPno connection

ldquobest effortrdquo service UDP segments may be

lost

establishment (which can add delay)simple no connection state

delivered out of order to app

c nn cti nl ss

simple no connection state at sender receiversmall segment header

connectionlessno handshaking between UDP sender receiver

no congestion control UDP can blast away as fast as desired

each UDP segment handled independently of others

Transport Layer 3-16

of others

UDP moreUDP moreoften used for streaming often used for streaming multimedia apps

loss tolerant source port dest port

32 bits

Length in f rate sensitive

other UDP usesDNS

length checksumbytes of UDPsegmentincluding

h dDNSSNMP

reliable transfer over UDP Application

header

reliable transfer over UDP add reliability at application layer

li ti s ifi

ppdata

(message)application-specific error recovery

UDP segment format

Transport Layer 3-17

UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted

segment

Sendertreat segment contents

f 16 bit

Receivercompute checksum of

i d tas sequence of 16-bit integerschecksum addition (1rsquos

received segmentcheck if computed checksum equals checksum field value(

complement sum) of segment contentssender puts checksum

qNO - error detectedYES - no error detected B t m b s sender puts checksum

value into UDP checksum field

But maybe errors nonetheless More later hellip

Transport Layer 3-18

Internet Checksum ExamplepNote

When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0

wraparound

sumh k

Transport Layer 3-19

1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-20

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-21

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-22

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-23

complexity of reliable data transfer protocol (rdt)

Reliable data transfer getting startedReliable data transfer getting started

rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to

deliver to receiver upper layer

deliver_data() called by rdt to deliver data to upper

send receivesendside

receiveside

udt_send() called by rdtto transfer packet over

li bl h l t i

rdt_rcv() called when packet arrives on rcv-side of channel

Transport Layer 3-24

unreliable channel to receiver

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 5: Ch 3Chapter 3 Transport Layer

How demultiplexing worksHow demultiplexing workshost receives IP datagrams

h d t h each datagram has source IP address destination IP address source port dest port

32 bits

each datagram carries 1 transport-layer segmenteach segment has source

other header fieldseach segment has source destination port number

host uses IP addresses amp port applicationnumbers to direct segment to appropriate socket

ppdata

(message)

TCPUDP segment format

Transport Layer 3-9

Connectionless demultiplexingConnectionless demultiplexing

C k i h When host receives UDP Create sockets with port numbers

DatagramSocket mySocket1 = new

When host receives UDP segment

checks destination port DatagramSocket mySocket1 = new DatagramSocket(12534)

DatagramSocket mySocket2 = new DatagramSocket(12535)

pnumber in segmentdirects UDP segment to socket with that port DatagramSocket(12535)

UDP socket identified by two-tuple

socket with that port number

IP datagrams with two tuple(dest IP address dest port number)

gdifferent source IP addresses andor source

t b di t d port numbers directed to same socket

Transport Layer 3-10

Connectionless demux (cont)Connectionless demux (cont)k k k (6428)DatagramSocket serverSocket = new DatagramSocket(6428)

P2 P1P1P3P2 P1P3

SP 6428DP 9157

SP 6428DP 5775

ClientIP B

clientIP A

serverSP 9157DP 6428

SP 5775DP 6428

IPBIP A IP C

SP provides ldquoreturn addressrdquo

Transport Layer 3-11

SP provides return address

Connection-oriented demuxConnection-oriented demux

TCP k id ifi d h TCP socket identified by 4-tuple

source IP address

Server host may support many simultaneous TCP socketssource IP address

source port numberdest IP address

socketseach socket identified by its own 4-tuple

dest port numberrecv host uses all four

Web servers have different sockets for

h i livalues to direct segment to appropriate s ck t

each connecting clientnon-persistent HTTP will have different socket for socket have different socket for each request

Transport Layer 3-12

Connection-oriented demux (cont)

P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-13

Connection-oriented demux Threaded Web Server

P1 P1P2P4 P3P1 P2P4 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-14

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]

ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP

Why is there a UDPno connection

ldquobest effortrdquo service UDP segments may be

lost

establishment (which can add delay)simple no connection state

delivered out of order to app

c nn cti nl ss

simple no connection state at sender receiversmall segment header

connectionlessno handshaking between UDP sender receiver

no congestion control UDP can blast away as fast as desired

each UDP segment handled independently of others

Transport Layer 3-16

of others

UDP moreUDP moreoften used for streaming often used for streaming multimedia apps

loss tolerant source port dest port

32 bits

Length in f rate sensitive

other UDP usesDNS

length checksumbytes of UDPsegmentincluding

h dDNSSNMP

reliable transfer over UDP Application

header

reliable transfer over UDP add reliability at application layer

li ti s ifi

ppdata

(message)application-specific error recovery

UDP segment format

Transport Layer 3-17

UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted

segment

Sendertreat segment contents

f 16 bit

Receivercompute checksum of

i d tas sequence of 16-bit integerschecksum addition (1rsquos

received segmentcheck if computed checksum equals checksum field value(

complement sum) of segment contentssender puts checksum

qNO - error detectedYES - no error detected B t m b s sender puts checksum

value into UDP checksum field

But maybe errors nonetheless More later hellip

Transport Layer 3-18

Internet Checksum ExamplepNote

When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0

wraparound

sumh k

Transport Layer 3-19

1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-20

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-21

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-22

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-23

complexity of reliable data transfer protocol (rdt)

Reliable data transfer getting startedReliable data transfer getting started

rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to

deliver to receiver upper layer

deliver_data() called by rdt to deliver data to upper

send receivesendside

receiveside

udt_send() called by rdtto transfer packet over

li bl h l t i

rdt_rcv() called when packet arrives on rcv-side of channel

Transport Layer 3-24

unreliable channel to receiver

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 6: Ch 3Chapter 3 Transport Layer

Connectionless demux (cont)Connectionless demux (cont)k k k (6428)DatagramSocket serverSocket = new DatagramSocket(6428)

P2 P1P1P3P2 P1P3

SP 6428DP 9157

SP 6428DP 5775

ClientIP B

clientIP A

serverSP 9157DP 6428

SP 5775DP 6428

IPBIP A IP C

SP provides ldquoreturn addressrdquo

Transport Layer 3-11

SP provides return address

Connection-oriented demuxConnection-oriented demux

TCP k id ifi d h TCP socket identified by 4-tuple

source IP address

Server host may support many simultaneous TCP socketssource IP address

source port numberdest IP address

socketseach socket identified by its own 4-tuple

dest port numberrecv host uses all four

Web servers have different sockets for

h i livalues to direct segment to appropriate s ck t

each connecting clientnon-persistent HTTP will have different socket for socket have different socket for each request

Transport Layer 3-12

Connection-oriented demux (cont)

P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-13

Connection-oriented demux Threaded Web Server

P1 P1P2P4 P3P1 P2P4 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-14

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]

ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP

Why is there a UDPno connection

ldquobest effortrdquo service UDP segments may be

lost

establishment (which can add delay)simple no connection state

delivered out of order to app

c nn cti nl ss

simple no connection state at sender receiversmall segment header

connectionlessno handshaking between UDP sender receiver

no congestion control UDP can blast away as fast as desired

each UDP segment handled independently of others

Transport Layer 3-16

of others

UDP moreUDP moreoften used for streaming often used for streaming multimedia apps

loss tolerant source port dest port

32 bits

Length in f rate sensitive

other UDP usesDNS

length checksumbytes of UDPsegmentincluding

h dDNSSNMP

reliable transfer over UDP Application

header

reliable transfer over UDP add reliability at application layer

li ti s ifi

ppdata

(message)application-specific error recovery

UDP segment format

Transport Layer 3-17

UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted

segment

Sendertreat segment contents

f 16 bit

Receivercompute checksum of

i d tas sequence of 16-bit integerschecksum addition (1rsquos

received segmentcheck if computed checksum equals checksum field value(

complement sum) of segment contentssender puts checksum

qNO - error detectedYES - no error detected B t m b s sender puts checksum

value into UDP checksum field

But maybe errors nonetheless More later hellip

Transport Layer 3-18

Internet Checksum ExamplepNote

When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0

wraparound

sumh k

Transport Layer 3-19

1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-20

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-21

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-22

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-23

complexity of reliable data transfer protocol (rdt)

Reliable data transfer getting startedReliable data transfer getting started

rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to

deliver to receiver upper layer

deliver_data() called by rdt to deliver data to upper

send receivesendside

receiveside

udt_send() called by rdtto transfer packet over

li bl h l t i

rdt_rcv() called when packet arrives on rcv-side of channel

Transport Layer 3-24

unreliable channel to receiver

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 7: Ch 3Chapter 3 Transport Layer

Connection-oriented demux (cont)

P1 P1P2P4 P5 P6 P3P1 P2P4 P5 P6 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-13

Connection-oriented demux Threaded Web Server

P1 P1P2P4 P3P1 P2P4 P3

SP 5775DP 80D 8

D-IPCS-IP B

ClientIP B

clientIP A

serverSP 9157DP 80

SP 9157DP 80

S IP A S IP B IPBIP A IP CD-IPC

S-IP AD-IPC

S-IP B

Transport Layer 3-14

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]

ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP

Why is there a UDPno connection

ldquobest effortrdquo service UDP segments may be

lost

establishment (which can add delay)simple no connection state

delivered out of order to app

c nn cti nl ss

simple no connection state at sender receiversmall segment header

connectionlessno handshaking between UDP sender receiver

no congestion control UDP can blast away as fast as desired

each UDP segment handled independently of others

Transport Layer 3-16

of others

UDP moreUDP moreoften used for streaming often used for streaming multimedia apps

loss tolerant source port dest port

32 bits

Length in f rate sensitive

other UDP usesDNS

length checksumbytes of UDPsegmentincluding

h dDNSSNMP

reliable transfer over UDP Application

header

reliable transfer over UDP add reliability at application layer

li ti s ifi

ppdata

(message)application-specific error recovery

UDP segment format

Transport Layer 3-17

UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted

segment

Sendertreat segment contents

f 16 bit

Receivercompute checksum of

i d tas sequence of 16-bit integerschecksum addition (1rsquos

received segmentcheck if computed checksum equals checksum field value(

complement sum) of segment contentssender puts checksum

qNO - error detectedYES - no error detected B t m b s sender puts checksum

value into UDP checksum field

But maybe errors nonetheless More later hellip

Transport Layer 3-18

Internet Checksum ExamplepNote

When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0

wraparound

sumh k

Transport Layer 3-19

1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-20

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-21

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-22

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-23

complexity of reliable data transfer protocol (rdt)

Reliable data transfer getting startedReliable data transfer getting started

rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to

deliver to receiver upper layer

deliver_data() called by rdt to deliver data to upper

send receivesendside

receiveside

udt_send() called by rdtto transfer packet over

li bl h l t i

rdt_rcv() called when packet arrives on rcv-side of channel

Transport Layer 3-24

unreliable channel to receiver

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 8: Ch 3Chapter 3 Transport Layer

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-15

UDP User Datagram Protocol [RFC 768]UDP User Datagram Protocol [RFC 768]

ldquono frills rdquo ldquobare bonesrdquo no frills bare bones Internet transport protocolldquob st ff trdquo s i UDP

Why is there a UDPno connection

ldquobest effortrdquo service UDP segments may be

lost

establishment (which can add delay)simple no connection state

delivered out of order to app

c nn cti nl ss

simple no connection state at sender receiversmall segment header

connectionlessno handshaking between UDP sender receiver

no congestion control UDP can blast away as fast as desired

each UDP segment handled independently of others

Transport Layer 3-16

of others

UDP moreUDP moreoften used for streaming often used for streaming multimedia apps

loss tolerant source port dest port

32 bits

Length in f rate sensitive

other UDP usesDNS

length checksumbytes of UDPsegmentincluding

h dDNSSNMP

reliable transfer over UDP Application

header

reliable transfer over UDP add reliability at application layer

li ti s ifi

ppdata

(message)application-specific error recovery

UDP segment format

Transport Layer 3-17

UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted

segment

Sendertreat segment contents

f 16 bit

Receivercompute checksum of

i d tas sequence of 16-bit integerschecksum addition (1rsquos

received segmentcheck if computed checksum equals checksum field value(

complement sum) of segment contentssender puts checksum

qNO - error detectedYES - no error detected B t m b s sender puts checksum

value into UDP checksum field

But maybe errors nonetheless More later hellip

Transport Layer 3-18

Internet Checksum ExamplepNote

When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0

wraparound

sumh k

Transport Layer 3-19

1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-20

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-21

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-22

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-23

complexity of reliable data transfer protocol (rdt)

Reliable data transfer getting startedReliable data transfer getting started

rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to

deliver to receiver upper layer

deliver_data() called by rdt to deliver data to upper

send receivesendside

receiveside

udt_send() called by rdtto transfer packet over

li bl h l t i

rdt_rcv() called when packet arrives on rcv-side of channel

Transport Layer 3-24

unreliable channel to receiver

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 9: Ch 3Chapter 3 Transport Layer

UDP moreUDP moreoften used for streaming often used for streaming multimedia apps

loss tolerant source port dest port

32 bits

Length in f rate sensitive

other UDP usesDNS

length checksumbytes of UDPsegmentincluding

h dDNSSNMP

reliable transfer over UDP Application

header

reliable transfer over UDP add reliability at application layer

li ti s ifi

ppdata

(message)application-specific error recovery

UDP segment format

Transport Layer 3-17

UDP checksumUDP checksumGoal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted

segment

Sendertreat segment contents

f 16 bit

Receivercompute checksum of

i d tas sequence of 16-bit integerschecksum addition (1rsquos

received segmentcheck if computed checksum equals checksum field value(

complement sum) of segment contentssender puts checksum

qNO - error detectedYES - no error detected B t m b s sender puts checksum

value into UDP checksum field

But maybe errors nonetheless More later hellip

Transport Layer 3-18

Internet Checksum ExamplepNote

When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0

wraparound

sumh k

Transport Layer 3-19

1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-20

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-21

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-22

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-23

complexity of reliable data transfer protocol (rdt)

Reliable data transfer getting startedReliable data transfer getting started

rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to

deliver to receiver upper layer

deliver_data() called by rdt to deliver data to upper

send receivesendside

receiveside

udt_send() called by rdtto transfer packet over

li bl h l t i

rdt_rcv() called when packet arrives on rcv-side of channel

Transport Layer 3-24

unreliable channel to receiver

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 10: Ch 3Chapter 3 Transport Layer

Internet Checksum ExamplepNote

When adding numbers a carryout from the When adding numbers a carryout from the most significant bit needs to be added to the resultresult

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0

wraparound

sumh k

Transport Layer 3-19

1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1checksum

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-20

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-21

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-22

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-23

complexity of reliable data transfer protocol (rdt)

Reliable data transfer getting startedReliable data transfer getting started

rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to

deliver to receiver upper layer

deliver_data() called by rdt to deliver data to upper

send receivesendside

receiveside

udt_send() called by rdtto transfer packet over

li bl h l t i

rdt_rcv() called when packet arrives on rcv-side of channel

Transport Layer 3-24

unreliable channel to receiver

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 11: Ch 3Chapter 3 Transport Layer

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-21

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-22

complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-23

complexity of reliable data transfer protocol (rdt)

Reliable data transfer getting startedReliable data transfer getting started

rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to

deliver to receiver upper layer

deliver_data() called by rdt to deliver data to upper

send receivesendside

receiveside

udt_send() called by rdtto transfer packet over

li bl h l t i

rdt_rcv() called when packet arrives on rcv-side of channel

Transport Layer 3-24

unreliable channel to receiver

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 12: Ch 3Chapter 3 Transport Layer

Principles of Reliable data transferPrinciples of Reliable data transferimportant in app transport link layerstop-10 list of important networking topics

characteristics of unreliable channel will determine

Transport Layer 3-23

complexity of reliable data transfer protocol (rdt)

Reliable data transfer getting startedReliable data transfer getting started

rdt send() called from above deliver data() called by dt_se d() called from above (eg by app) Passed data to

deliver to receiver upper layer

deliver_data() called by rdt to deliver data to upper

send receivesendside

receiveside

udt_send() called by rdtto transfer packet over

li bl h l t i

rdt_rcv() called when packet arrives on rcv-side of channel

Transport Layer 3-24

unreliable channel to receiver

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 13: Ch 3Chapter 3 Transport Layer

Reliable data transfer getting startedReliable data transfer getting startedWersquoll

incrementally develop sender receiver sides of reliable data transfer protocol (rdt)consider only unidirectional data transfer

but control info will flow on both directionsuse finite state machines (FSM) to specify sender receiver

event causing state transition

state state

gactions taken on state transition

state when in this ldquostaterdquo next state

1state

2state next state

uniquely determined by next event

eventactions

Transport Layer 3-25

Rdt1 0 reliable transfer over a reliable channelRdt10 reliable transfer over a reliable channel

underlying channel perfectly reliabley g p yno bit errorsno loss of packets

separate FSMs for sender receiversender sends data into underlying channel

i d d t f d l i h lreceiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

udt_send(packet)

sender receiver

Transport Layer 3-26

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 14: Ch 3Chapter 3 Transport Layer

Rdt2 0 channel with bit errorsRdt20 channel with bit errors

underlying channel may flip bits in packety g y p pchecksum to detect bit errors

the question how to recover from errorsacknowledgements (ACKs) receiver explicitly tells sender that pkt received OKnegative acknowledgements (NAKs) receiver explicitly negative acknowledgements (NAKs) receiver explicitly tells sender that pkt had errorssender retransmits pkt on receipt of NAK

new mechanisms in rdt20 (beyond rdt10)error detectionreceiver feedback control msgs (ACKNAK) rcvr-gtsender

Transport Layer 3-27

rdt2 0 FSM specificationrdt20 FSM specificationsnkpkt = make pkt(data checksum) receiverrdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for

receiver

call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowdΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

belowsender

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-28

udt_send(ACK)

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 15: Ch 3Chapter 3 Transport Layer

rdt2 0 operation with no errorsrdt20 operation with no errorssnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-29

udt_send(ACK)

rdt2 0 error scenariordt20 error scenariosnkpkt = make pkt(data checksum)

rdt_send(data)

Wait for

snkpkt make_pkt(data checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)

rdt rcv(rcvpkt) ampampWait for call from above

udt_send(sndpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or

NAK

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from

belowΛ

rdt_rcv(rcvpkt) ampamp t t( kt)

below

extract(rcvpktdata)deliver_data(data)udt send(ACK)

notcorrupt(rcvpkt)

Transport Layer 3-30

udt_send(ACK)

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 16: Ch 3Chapter 3 Transport Layer

rdt2 0 has a fatal flawrdt20 has a fatal flaw

Wh h if H dli d li What happens if ACKNAK corruptedsender doesnrsquot know what

Handling duplicates sender retransmits current pkt if ACKNAK garbledsender doesn t know what

happened at receivercanrsquot just retransmit

pkt if ACKNAK garbledsender adds sequence number to each pkt

possible duplicate receiver discards (doesnrsquot deliver up) duplicate pkt

Sender sends one packet stop and wait

p then waits for receiver response

Transport Layer 3-31

rdt2 1 sender handles garbled ACKNAKsrdt21 sender handles garbled ACKNAKs

sndpkt make pkt(0 data checks m)

rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

Wait for

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for

call 0 from above

ACK or NAK 0 udt_send(sndpkt)

isNAK(rcvpkt) )

rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

ΛΛ

dt d(d t )

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

Wait forcall 1 from

above

Wait for ACK or NAK 1

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

( p ( p ) ||isNAK(rcvpkt) )

Transport Layer 3-32

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 17: Ch 3Chapter 3 Transport Layer

rdt2 1 receiver handles garbled ACKNAKsrdt21 receiver handles garbled ACKNAKsrdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)

dt d( d kt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

udt_send(sndpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq0(rcvpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamphas seq1(rcvpkt) has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make pkt(ACK chksum)

Transport Layer 3-33

sndpkt make_pkt(ACK chksum)udt_send(sndpkt)

rdt2 1 discussionrdt21 discussion

d R iSenderseq added to pkt

Receivermust check if received

k t is d li ttwo seq rsquos (01) will suffice Why

t h k if i d

packet is duplicatestate indicates whether 0 or 1 is expected pkt must check if received

ACKNAK corrupted twice as many states

or s p ct p t seq

note receiver can notk f l twice as many states

state must ldquorememberrdquo whether ldquocurrentrdquo pkt

know if its last ACKNAK received OK at senderp

has 0 or 1 seq at sender

Transport Layer 3-34

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 18: Ch 3Chapter 3 Transport Layer

rdt2 2 a NAK-free protocolrdt22 a NAK free protocol

f ti lit dt2 1 i ACK lsame functionality as rdt21 using ACKs onlyinstead of NAK receiver sends ACK for last pkt received OKreceived OK

receiver must explicitly include seq of pkt being ACKed duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt

Transport Layer 3-35

rdt22 sender receiver fragmentsrdt22 sender receiver fragments

sndpkt = make pkt(0 data checksum)rdt_send(data)

Wait for

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp

( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for ACKcall 0 from

above udt_send(sndpkt)

isACK(rcvpkt1) )

rdt rcv(rcvpkt)

ACK0

sender FSMfragment rdt_rcv(rcvpkt)

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

fragment

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || Λ

Wait for 0 from below

(corrupt(rcvpkt) ||has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMfragment

Λ

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)

Transport Layer 3-36

deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 19: Ch 3Chapter 3 Transport Layer

rdt3 0 channels with errors and lossrdt30 channels with errors and loss

N i h d i New assumptionunderlying channel can also lose packets (data

Approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets (data

or ACKs)checksum seq ACKs

time for ACK retransmits if no ACK received in this time q

retransmissions will be of help but not enough

if pkt (or ACK) just delayed (not lost)

retransmission will be retransmission will be duplicate but use of seq rsquos already handles thisreceiver must specify seq of pkt being ACKed

requires countdown timer

Transport Layer 3-37

requires countdown timer

rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)

dt d( d kt)

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||

C ( 1) )udt_send(sndpkt)start_timer

Wait for

isACK(rcvpkt1) )

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

ΛΛ

for ACK0

rdt_rcv(rcvpkt) ampamp t t( kt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt 1)

udt_send(sndpkt)start_timer

call 0from above

Wait for

ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

ampamp isACK(rcvpkt1)

stop_timerstop_timer

Wait Wait for call 1 from

above

rdt send(data)

udt_send(sndpkt)start_timer

timeout for ACK1

Λrdt_rcv(rcvpkt)

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

Λ

Transport Layer 3-38

Λ

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 20: Ch 3Chapter 3 Transport Layer

rdt3 0 in actionrdt30 in action

Transport Layer 3-39

rdt3 0 in actionrdt30 in action

Transport Layer 3-40

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 21: Ch 3Chapter 3 Transport Layer

Performance of rdt3 0Performance of rdt30

d 3 0 k b f i krdt30 works but performance stinksex 1 Gbps link 15 ms prop delay 8000 bit packet

dsmicrosecon8bps10

bits80009

===R

Ldtrans

U sender utilization ndash fraction of time sender busy sending

bps10R

U sender =

008

30008 = 000027 L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps linknetwork protocol limits use of physical resources

Transport Layer 3-41

rdt3 0 stop-and-wait operationrdt30 stop and wait operationsender receiver

first packet bit transmitted t = 0last packet bit transmitted t = L R

RTTfirst packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L Rp

U sender =

008

30008 = 000027 L R

RTT + L R =

Transport Layer 3-42

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 22: Ch 3Chapter 3 Transport Layer

Pipelined protocolsPipelined protocolsPipelining sender allows multiple ldquoin-flightrdquo yet-to-p g p g y

be-acknowledged pktsrange of sequence numbers must be increasedb ff i d d ibuffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 3-43

selective repeat

Pipelining increased utilizationPipelining increased utilization

first packet bit transmitted t = 0

sender receiver

first packet bit transmitted t = 0last bit transmitted t = L R

RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 3rd packet arrives send ACK

Increase utilizationby a factor of 3

U sender =

024

30008 = 00008 3 L R

RTT + L R =

Transport Layer 3-44

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 23: Ch 3Chapter 3 Transport Layer

Pipelining ProtocolsPipelining Protocols

G b k N bi i t S l ti R t bi iGo-back-N big pictureSender can have up to N unacked packets in

Selective Repeat big picSender can have up to N unacked packets in N unacked packets in

pipelineRcvr only sends

N unacked packets in pipelineRcvr acks individual y

cumulative acksDoesnrsquot ack packet if therersquos a gap

packetsSender maintains timer for each there s a gap

Sender has timer for oldest unacked packet

timer for each unacked packet

When timer expires If timer expires retransmit all unacked packets

pretransmit only unack packet

Transport Layer 3-45

Selective repeat big pictureSelective repeat big picture

d h N k d k Sender can have up to N unacked packets in pipelineRcvr acks individual packetsSender maintains timer for each unacked Sender maintains timer for each unacked packet

When timer expires retransmit only unack When timer expires retransmit only unack packet

Transport Layer 3-46

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 24: Ch 3Chapter 3 Transport Layer

Go-Back-NGo-Back-NSender

k-bit seq in pkt headerldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo i d li t ACK ( i )may receive duplicate ACKs (see receiver)

timer for each in-flight pkttimeout(n) retransmit pkt n and all higher seq pkts in window

Transport Layer 3-47

timeout(n) retransmit pkt n and all higher seq pkts in window

GBN sender extended FSMrdt_send(data)

if (nextseqnum lt base+N) ( )sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum)

udt_send(sndpkt[nextseqnum])if (base == nextseqnum)

start_timernextseqnum++

elserefuse_data(data)

base=1

Λ

Wait start_timerudt_send(sndpkt[base])

timeout

base 1nextseqnum=1

udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

rdt rcv(rcvpkt) ampamp

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

base = getacknum(rcvpkt)+1If (base == nextseqnum)

t ti

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

Transport Layer 3-48

stop_timerelsestart_timer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 25: Ch 3Chapter 3 Transport Layer

GBN receiver extended FSMGBN receiver extended FSM

udt_send(sndpkt)

default

rdt rcv(rcvpkt)

Wait

_ ( p ) rdt_rcv(rcvpkt)ampamp notcurrupt(rcvpkt)ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)expectedseqnum=1

Λ

deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

sndpkt = make_pkt(expectedseqnumACKchksum)

ACK-only always send ACK for correctly-received pkt with highest in-order seq with highest in order seq

may generate duplicate ACKsneed only remember expectedseqnumy

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering

Transport Layer 3-49

Re-ACK pkt with highest in-order seq

GBN inGBN inaction

Transport Layer 3-50

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 26: Ch 3Chapter 3 Transport Layer

Selective RepeatSelective Repeat

receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts

buffers pkts as needed for eventual in-order delivery p yto upper layer

sender only resends pkts for which ACK not i dreceived

sender timer for each unACKed pkts nd indsender window

N consecutive seq rsquosagain limits seq s of sent unACKed pktsagain limits seq s of sent unACKed pkts

Transport Layer 3-51

Selective repeat sender receiver windowsSelective repeat sender receiver windows

Transport Layer 3-52

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 27: Ch 3Chapter 3 Transport Layer

Selective repeatS ct r p at

d f b sender

kt i receiver

data from above if next available seq in window send pkt

pkt n in [rcvbase rcvbase+N-1]

send ACK(n)out of order bufferwindow send pkt

timeout(n)resend pkt n restart timer

out-of-order bufferin-order deliver (also deliver buffered in-order r s n p t n r start t m r

ACK(n) in [sendbasesendbase+N]

mark pkt n as received

pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase N rcvbase 1]pif n smallest unACKed pkt advance window base to next unACKed seq

pkt n in [rcvbase-Nrcvbase-1]

ACK(n)otherwisenext unACKed seq otherwise

ignore

Transport Layer 3-53

Selective repeat in actionp

Transport Layer 3-54

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 28: Ch 3Chapter 3 Transport Layer

Selective repeatdilemma

Example Example seq rsquos 0 1 2 3window size=3window size 3

receiver sees no difference in two scenariosincorrectly passes incorrectly passes duplicate data as new in (a)

Q what relationship between seq size

Transport Layer 3-55

between seq size and window size

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-56

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 29: Ch 3Chapter 3 Transport Layer

TCP Overview RFCs 793 1122 1323 2018 2581TCP Overview RFCs 793 1122 1323 2018 2581

f ll d l d ti t t i t full duplex databi-directional data flow in same connection

point-to-pointone sender one receiver

reliable in order byte n same connect onMSS maximum segment size

reliable in-order byte steam

no ldquomessage boundariesrdquoconnection-oriented

handshaking (exchange of control msgs) initrsquos

no message boundar espipelined

TCP congestion and flow of control msgs) init s sender receiver state before data exchange

gcontrol set window size

send amp receive buffersflow controlled

sender will not overwhelm receiver

socketdoor

TCP TCP

socketdoor

applicationwrites data

applicationreads data

Transport Layer 3-57

overwhelm receiverTCPsend buffer

TCPreceive buffer

segment

TCP segment structure

t d t t

32 bitsURG urgent data countingsource port dest port

sequence numberacknowledgement number

g(generally not used)

ACK ACK lid

countingby bytes of data(not segments)acknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

validPSH push data now(generally not used) bytes

illin

(not segments)

Urg data pnter

Options (variable length)RST SYN FINconnection estab( d

rcvr willingto accept

applicationdata

(setup teardowncommands)

Internet data (variable length)

Internetchecksum

(as in UDP)

Transport Layer 3-58

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 30: Ch 3Chapter 3 Transport Layer

TCP seq rsquos and ACKsTCP seq s and ACKsSeq rsquos Host A Host B

byte stream ldquonumberrdquo of first byte in segmentrsquos

Host A Host B

Usertypes

lsquoCrsquoy gmdata

ACKs f t b t

Chost ACKsreceipt oflsquoCrsquo echoes

b k lsquoCrsquoseq of next byte expected from other side host ACKs

back lsquoCrsquo

cumulative ACKQ how receiver handles

out of order segments

host ACKsreceipt

of echoedlsquoCrsquo

out-of-order segmentsA TCP spec doesnrsquot say - up to time

simple telnet scenario

Transport Layer 3-59

implementor simple telnet scenario

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Q how to set TCP Q how to estimate RTTQ how to set TCP timeout valuelonger than RTT

QSampleRTT measured time from segment transmission until ACK receiptbut RTT varies

too short premature timeout

receiptignore retransmissions

SampleRTT will vary want timeoutunnecessary retransmissions

p yestimated RTT ldquosmootherrdquo

average several recent measurements not just too long slow reaction

to segment lossmeasurements not just current SampleRTT

Transport Layer 3-60

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 31: Ch 3Chapter 3 Transport Layer

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

EstimatedRTT = (1- α)EstimatedRTT + αSampleRTTEstimatedRTT (1 α) EstimatedRTT + α SampleRTT

Exponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value α = 0125

Transport Layer 3-61

Example RTT estimationExample RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

350

300

350

250

onds

)

200RTT

(mill

isec

o

150

100

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

Transport Layer 3-62

SampleRTT Estimated RTT

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 32: Ch 3Chapter 3 Transport Layer

TCP Round Trip Time and TimeoutTCP Round Trip Time and Timeout

Setting the timeoutgEstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety marginfirst estimate of how much SampleRTT deviates from EstimatedRTT

DevRTT = (1-β)DevRTT +β|SampleRTT-EstimatedRTT|

(typically β = 025)

Then set timeout interval

TimeoutInterval = EstimatedRTT + 4DevRTT

Then set timeout interval

Transport Layer 3-63

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-64

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 33: Ch 3Chapter 3 Transport Layer

TCP reliable data transferTCP reliable data transfer

TCP d R i i TCP creates rdt service on top of IPrsquos unreliable service

Retransmissions are triggered by

timeout eventsunreliable servicePipelined segmentsCumulative acks

timeout eventsduplicate acks

Initially consider Cumulative acksTCP uses single retransmission timer

Initially consider simplified TCP sender

ignore duplicate acksretransmission timer pignore flow control congestion control

Transport Layer 3-65

TCP sender eventsdata rcvd from app

Create segment with timeout

retransmit segment Create segment with seq seq is byte-stream

retransmit segment that caused timeoutrestart timerseq is byte-stream

number of first data byte in segment

restart timerAck rcvd

If acknowledges y gstart timer if not already running (think

If acknowledges previously unacked segments

of timer as for oldest unacked segment)

i i i l

gupdate what is known to be ackedt t ti if th expiration interval

TimeOutInterval start timer if there are outstanding segments

Transport Layer 3-66

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 34: Ch 3Chapter 3 Transport Layer

TCP NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

TCP sender

loop (forever) switch(event)

d i d f li i b

(simplified)event data received from application above

create TCP segment with sequence number NextSeqNum if (timer currently not running)

start timerstart timerpass segment to IP NextSeqNum = NextSeqNum + length(data)

t ti ti t

Commentbull SendBase-1 last cumulatively event timer timeout

retransmit not-yet-acknowledged segment with smallest sequence number

start timer

cumulatively ackrsquoed byteExamplebull SendBase-1 = 71start timer

event ACK received with ACK field value of y if (y gt SendBase)

S dB

SendBase 1 71y= 73 so the rcvrwants 73+ y gt SendBase so

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer

y that new data is acked

Transport Layer 3-67

end of loop forever

TCP retransmission scenarios retransm ss on scenar osHost A Host BHost A Host B

eout

eq=9

2 ti

me

l

tim

eout

X

Seloss

ut

Sendbase= 100

=92

tim

eou

SendBase= 120

100

premature timeout

Seq

SendBase= 100 SendBase

= 120

Transport Layer 3-68

timep

lost ACK scenariotime

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 35: Ch 3Chapter 3 Transport Layer

TCP retransmission scenarios (more)TCP retransmission scenarios (more)Host A Host B

l

tim

eout

Xloss

SendBase = 120

Cumulative ACK scenariotime

Transport Layer 3-69

TCP ACK generation [RFC 1122 RFC 2581]TCP ACK generation [RFC 1122 RFC 2581]

E t t R i TCP R i tiEvent at Receiver

Arrival of in-order segment with

TCP Receiver action

Delayed ACK Wait up to 500msf t t If t texpected seq All data up to

expected seq already ACKed

A i l f i d t ith

for next segment If no next segmentsend ACK

Immediatel send single c m lati eArrival of in-order segment withexpected seq One other segment has ACK pending

Immediately send single cumulative ACK ACKing both in-order segments

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Immediately send duplicate ACK indicating seq of next expected byte

Gap detected

Arrival of segment that partially or completely fills gap

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 3-70

partially or completely fills gap segment starts at lower end of gap

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 36: Ch 3Chapter 3 Transport Layer

Fast RetransmitFast Retransmit

Ti i d f If d i 3 Time-out period often relatively long

long delay before

If sender receives 3 ACKs for the same data it supposes that long delay before

resending lost packetDetect lost segments

data it supposes that segment after ACKed data was lostgm

via duplicate ACKsSender often sends

t b k t

fast retransmit resend segment before timer expiresmany segments back-to-

backIf segment is lost

expires

f gm there will likely be many duplicate ACKs

Transport Layer 3-71

Host A Host B

X

tim

eout

t

time

Transport Layer 3-72Figure 337 Resending a segment after triple duplicate ACK

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 37: Ch 3Chapter 3 Transport Layer

Fast retransmit algorithmFast retransmit algorithm

event ACK received with ACK field value of y if (y gt SendBase)

SendBase = ySendBase = yif (there are currently not-yet-acknowledged segments)

start timer

else increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) ( p y )

resend segment with sequence number y

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 3-73

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-74

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 38: Ch 3Chapter 3 Transport Layer

TCP Flow ControlTCP Flow Control

i id f TCP sender wonrsquot overflowflow control

receive side of TCP connection has a receive buffer

sender won t overflowreceiverrsquos buffer by

transmitting too muchtoo fastreceive buffer

d t hi

too fast

speed-matching service matching the send rate to the send rate to the receiving apprsquos drain rateapp process may be app process may be

slow at reading from buffer

Transport Layer 3-75

buffer

TCP Flow control how it worksTCP Flow control how it worksRcvr advertises spare Rcvr advertises spare room by including value of RcvWindow in

(S TCP i

segmentsSender limits unACKed

(Suppose TCP receiver discards out-of-order segments)

data to RcvWindowguarantees receive buffer doesnrsquot overflowsegments)

spare room in buffer= RcvWindow

buffer doesn t overflow

RcvWindow

= RcvBuffer-[LastByteRcvd -LastByteRead]

Transport Layer 3-76

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 39: Ch 3Chapter 3 Transport Layer

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-77

TCP Connection ManagementRecall TCP sender receiver

establish ldquoconnectionrdquo b f h i d t

Three way handshakeStep 1 client host sends TCP before exchanging data

segmentsinitialize TCP variables

Step 1 client host sends TCP SYN segment to server

specifies initial seq seq sbuffers flow control i f ( R Wi d )

pno data

Step 2 server host receives YN li i h YNACK info (eg RcvWindow)

client connection initiatorSocket clientSocket = new

SYN replies with SYNACK segment

server allocates buffersSocket(hostnameport

number)

server contacted by client

server allocates buffersspecifies server initial seq

server contacted by clientSocket connectionSocket = welcomeSocketaccept()

Step 3 client receives SYNACK replies with ACK segment which may contain data

Transport Layer 3-78

m y

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 40: Ch 3Chapter 3 Transport Layer

TCP C ti M t ( t )TCP Connection Management (cont)

Closing a connection

client closes socket

client server

closeclient closes socketclientSocketclose()

Step 1 client end system lStep 1 client end system sends TCP FIN control segment to server

close

Step 2 server receives FIN replies with ACK ed

wai

t

pCloses connection sends FIN closed

tim

e

Transport Layer 3-79

TCP C ti M t ( t )TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

client server

closingEnters ldquotimed waitrdquo -will respond with ACK to received FINs l ito received FINs

Step 4 server receives ACK Connection closed

closing

ACK Connection closed

Note with small modification can handle ed

wai

t

closedmodification can handle simultaneous FINs

closed

tim

e

Transport Layer 3-80

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 41: Ch 3Chapter 3 Transport Layer

TCP Connection Management (cont)TCP Connection Management (cont)

TCP serverlifecycle

TCP clientlifecyclelifecycle

Transport Layer 3-81

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-82

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 42: Ch 3Chapter 3 Transport Layer

Principles of Congestion ControlPrinciples of Congestion Control

Congestioninformally ldquotoo many sources sending too much d f f k h dl rdquodata too fast for network to handlerdquodifferent from flow controlmanifestations

lost packets (buffer overflow at routers)long delays (queueing in router buffers)

a top-10 problem

Transport Layer 3-83

Causescosts of congestion scenario 1Causescosts of congestion scenario 1

two senders two Host A

λin original dataλouttwo senders two

receiversone router

unlimited shared output link buffers

Host Bone router infinite buffers no retransmission

l d l large delays when congested

i maximum achievable throughput

Transport Layer 3-84

throughput

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 43: Ch 3Chapter 3 Transport Layer

Causescosts of congestion scenario 2Causescosts of congestion scenario 2

one router finite buffers one router finite buffers sender retransmission of lost packet

Host A λin original data

λout

finite shared output Host B

λin original data plus retransmitted data

link buffers

Transport Layer 3-85

Causescosts of congestion scenario 2galways (goodput)ldquoperfectrdquo retransmission only when loss

λin

λout

=

λ λgtp f m y

retransmission of delayed (not lost) packet makes larger (than perfect case) for same

λin

λout

gtλ

inλ( p f ) f λ

outR2R2 R2

λ out

λ out

λ out

R4

R3

R2λin

b

R2λin

a

R2λin

c

ldquocostsrdquo of congestionmore work (retrans) for given ldquogoodputrdquo

ba c

Transport Layer 3-86

punneeded retransmissions link carries multiple copies of pkt

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 44: Ch 3Chapter 3 Transport Layer

Causescosts of congestion scenario 3Causescosts of congestion scenario 3four senders λ

inQ what happens as

multihop pathstimeoutretransmit

inQ pp

and increase λin

Host Aλin original data

λout

λin original data plus retransmitted data

finite shared output link buffers

Host B

Transport Layer 3-87

Causescosts of congestion scenario 3Causescosts of congestion scenario 3Ho

λost A

H

ou

t

Host B

Another ldquocostrdquo of congestionh k d d ldquo i i when packet dropped any ldquoupstream transmission

capacity used for that packet was wasted

Transport Layer 3-88

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 45: Ch 3Chapter 3 Transport Layer

Approaches towards congestion controlApproaches towards congestion control

Two broad approaches towards congestion control

End-end congestion Network-assisted

Two broad approaches towards congestion control

gcontrolno explicit feedback from network

congestion controlrouters provide feedback to end systemsnetwork

congestion inferred from end-system observed loss

to end systemssingle bit indicating congestion (SNA

delayapproach taken by TCP

DECbit TCPIP ECN ATM)explicit rate sender explicit rate sender should send at

Transport Layer 3-89

Case study ATM ABR congestion controlCase study ATM ABR congestion control

BR il bl bi RM ( ) ABR available bit rateldquoelastic servicerdquo if senderrsquos path

RM (resource management) cellssent by sender interspersed if sender s path

ldquounderloadedrdquo sender should use

sent by sender interspersed with data cellsbits in RM cell set by switches

available bandwidthif senderrsquos path congested

(ldquonetwork-assistedrdquo) NI bit no increase in rate (mild congestion)congested

sender throttled to minimum guaranteed

t

(mild congestion)CI bit congestion indication

rate RM cells returned to sender by receiver with bits intact

Transport Layer 3-90

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 46: Ch 3Chapter 3 Transport Layer

Case study ATM ABR congestion controlCase study ATM ABR congestion control

two-byte ER (explicit rate) field in RM celly pcongested switch may lower ER value in cellsenderrsquo send rate thus maximum supportable rate on path

llEFCI bit in data cells set to 1 in congested switchif data cell preceding RM cell has EFCI set sender sets CI bit in returned RM cell

Transport Layer 3-91

bit in returned RM cell

Chapter 3 outlineChapter 3 outline

3 1 T l 3 5 C i i d 31 Transport-layer services3 2 M lti l i d

35 Connection-oriented transport TCP

segment structure32 Multiplexing and demultiplexing3 3 Connectionless

segment structurereliable data transferflow control33 Connectionless

transport UDP3 4 Principles of

fconnection management

36 Principles of 34 Principles of reliable data transfer

pcongestion control37 TCP congestion

lcontrol

Transport Layer 3-92

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 47: Ch 3Chapter 3 Transport Layer

TCP congestion control additive increase gmultiplicative decrease

Approach increase transmission rate (window size) pproach ncr as transm ss on rat (w n ow s z ) probing for usable bandwidth until loss occurs

additive increase increase CongWin by 1 MSS g yevery RTT until loss detectedmultiplicative decrease cut CongWin in half after

24 Kbytes

congestionwindow

loss w

siz

e

16 Kbytes

on w

indo

w

Saw toothbehavior probing

for bandwidth8 Kbytes

titimeco

nges

tiofor bandwidth

Transport Layer 3-93

timetime

TCP Congestion Control detailsTCP Congestion Control details

d li i i i H d d sender limits transmissionLastByteSent-LastByteAcked

le CongWin

How does sender perceive congestionl ss t ti t le CongWin

Roughlyloss event = timeout or3 duplicate acksTCP sender reduces C Wi

CongWin is dynamic function

TCP sender reduces rate (CongWin) after loss event

rate = CongWinRTT Bytessec

CongWin is dynamic function of perceived network congestion

nthree mechanisms

AIMDcongestionslow startconservative after time ut events

Transport Layer 3-94

timeout events

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 48: Ch 3Chapter 3 Transport Layer

TCP Slow StartTCP Slow Start

Wh i b i Wh i b i When connection begins CongWin = 1 MSS

Example MSS = 500

When connection begins increase rate exponentially fast until Example MSS = 500

bytes amp RTT = 200 msecinitial rate = 20 kbps

exponentially fast until first loss event

available bandwidth may be gtgt MSSRTT

desirable to quickly ramp up to respectable rate

Transport Layer 3-95

TCP Slow Start (more)TCP Slow Start (more)

Wh i When connection begins increase rate exponentially until

Host A

T

Host B

exponentially until first loss event

double CongWin every

RTT

g yRTTdone by incrementing CongWin for every ACK CongWin for every ACK received

Summary initial rate yis slow but ramps up exponentially fast time

Transport Layer 3-96

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 49: Ch 3Chapter 3 Transport Layer

Refinement inferring lossRefinement inferring lossAfter 3 dup ACKsAfter 3 dup ACKs

CongWin is cut in halfi d th s

Philosophywindow then grows linearly

But after timeout event3 dup ACKs indicates

network capable of But after timeout eventCongWin instead set to 1 MSS

n tw r capa f delivering some segments

timeout indicates a ldquo l rdquo 1 MSS

window then grows exponentially

ldquomore alarmingrdquo congestion scenario

exponentiallyto a threshold then grows linearly

Transport Layer 3-97

g y

RefinementQ When should the

exponential exponential increase switch to linear

A When CongWingets to 12 of its value before timeout

ImplementationImplementationVariable Threshold At loss event Threshold is set to 12 of CongWin just before loss event

Transport Layer 3-98

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 50: Ch 3Chapter 3 Transport Layer

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearlycongestion avoidance phase window grows linearly

When a triple duplicate ACK occurs Thresholdset to CongWin2 and CongWin set to set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 3-99

TCP sender congestion controlgState Event TCP Sender Action Commentary

Sl St t ACK i t C Wi C Wi + MSS R lti i d bli fSlow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold)

set state to ldquoCongestion Avoidance

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidance

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSSduplicate

ACKAvoidance drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSS

Enter slow start

Set state to ldquoSlow Start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 3-100

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 51: Ch 3Chapter 3 Transport Layer

TCP throughputTCP throughput

h rsquo h h h f P Whatrsquos the average throughout of TCP as a function of window size and RTT

Ignore slow startLet W be the window size when loss occursWhen window is W throughput is WRTTJust after loss window drops to W2 Just after loss window drops to W2 throughput to W2RTT Avera e thr u h ut 75 WRTTAverage throughout 75 WRTT

Transport Layer 3-101

TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes

E l 1500 b 100 RTT 10 Example 1500 byte segments 100ms RTT want 10 Gbps throughputR i s i d si W 83 333 i fli ht Requires window size W = 83333 in-flight segmentsThroughput in terms of loss rateThroughput in terms of loss rate

MSSsdot221

L = 210-10 WowLRTT

New versions of TCP for high-speed

Transport Layer 3-102

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 52: Ch 3Chapter 3 Transport Layer

TCP FairnessFairness goal if K TCP sessions share same

TCP FairnessFairness goal if K TCP sessions share same

bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

it R

TCP connection 2 capacity Rconnect on

Transport Layer 3-103

Why is TCP fairWhy is TCP fairTwo competing sessionsTwo competing sessions

Additive increase gives slope of 1 as throughout increasesmultiplicative decrease decreases throughput proportionally

R equal bandwidth share

i id ddi i iloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increase

RC nn ti n 1 th hp t

Transport Layer 3-104

RConnection 1 throughput

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP

Page 53: Ch 3Chapter 3 Transport Layer

Fairness (more)Fairness and UDP

M lti di ft Fairness and parallel TCP

connectionsMultimedia apps often do not use TCP

do not want rate

connectionsnothing prevents app from opening parallel do not want rate

throttled by congestion control

Instead use UDP

p g pconnections between 2 hostsWeb browsers do this Instead use UDP

pump audiovideo at constant rate tolerate

k t l

Web browsers do this Example link of rate R supporting 9 connections packet loss

Research area TCP friendly

supporting 9 connections new app asks for 1 TCP gets rate R10new app asks for 11 TCPs friendly new app asks for 11 TCPs gets R2

Transport Layer 3-105

Chapter 3 SummaryChapter 3 Summaryprinciples behind transport p p player services

multiplexing p gdemultiplexingreliable data transferflow controlcongestion control

Nextleaving the network

instantiation and implementation in the I

gldquoedgerdquo (application transport layers)

InternetUDP

P

into the network ldquocorerdquo

Transport Layer 3-106

TCP