transport layer by ossi mokryn, based also on slides from: the computer networking: a top down...

84
Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Upload: amari-reams

Post on 14-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Transport Layer

By Ossi Mokryn Based also on slides from the Computer Networking

A Top Down Approach Featuring the Internet by Kurose and Ross

Transport Layer

Connectionless and connection oriented communication

Sockets programming UDP TCP

Reliable communication Flow control Congestion control Timers

2Transport Layer

Transport services and protocols provide logical

communication between app processes running on different hosts

transport protocols run in end systems send side breaks app

messages into segments passes to network layer

rcv side reassembles segments into messages passes to app layer

more than one transport protocol available to apps Internet TCP and UDP

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

logical end-end transport

Transport Layer 3

Internet transport-layer protocols reliable in-order

delivery (TCP) congestion control flow control connection setup

unreliable unordered delivery UDP no-frills extension of

ldquobest-effortrdquo IP services not available

delay guarantees bandwidth guarantees

application

transportnetworkdata linkphysical network

data linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

application

transportnetworkdata linkphysical

logical end-end transport

Transport Layer 4

Transport vs network layer

network layer logical communication between hosts

transport layer logical communication between processes relies on enhances network layer services

Household analogy12 kids sending letters

to 12 kids processes = kids app messages =

letters in envelopes hosts = houses transport protocol =

Ann and Bill network-layer

protocol = postal service

Transport Layer 5

Multiplexingdemultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Transport Layer 6

How demultiplexing works host receives IP datagrams

each datagram has source IP address destination IP address

each datagram carries 1 transport-layer segment

each segment has source destination port number

host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Transport Layer 7

Connectionless demultiplexing Create sockets with port

numbers UDP socket identified by

two-tuple(dest IP address dest port number)

When host receives UDP segment checks destination port

number in segment directs UDP segment to

socket with that port number

IP datagrams with different source IP addresses andor source port numbers directed to same socket

Transport Layer 8

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

Transport Layer 10

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 2: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Transport Layer

Connectionless and connection oriented communication

Sockets programming UDP TCP

Reliable communication Flow control Congestion control Timers

2Transport Layer

Transport services and protocols provide logical

communication between app processes running on different hosts

transport protocols run in end systems send side breaks app

messages into segments passes to network layer

rcv side reassembles segments into messages passes to app layer

more than one transport protocol available to apps Internet TCP and UDP

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

logical end-end transport

Transport Layer 3

Internet transport-layer protocols reliable in-order

delivery (TCP) congestion control flow control connection setup

unreliable unordered delivery UDP no-frills extension of

ldquobest-effortrdquo IP services not available

delay guarantees bandwidth guarantees

application

transportnetworkdata linkphysical network

data linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

application

transportnetworkdata linkphysical

logical end-end transport

Transport Layer 4

Transport vs network layer

network layer logical communication between hosts

transport layer logical communication between processes relies on enhances network layer services

Household analogy12 kids sending letters

to 12 kids processes = kids app messages =

letters in envelopes hosts = houses transport protocol =

Ann and Bill network-layer

protocol = postal service

Transport Layer 5

Multiplexingdemultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Transport Layer 6

How demultiplexing works host receives IP datagrams

each datagram has source IP address destination IP address

each datagram carries 1 transport-layer segment

each segment has source destination port number

host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Transport Layer 7

Connectionless demultiplexing Create sockets with port

numbers UDP socket identified by

two-tuple(dest IP address dest port number)

When host receives UDP segment checks destination port

number in segment directs UDP segment to

socket with that port number

IP datagrams with different source IP addresses andor source port numbers directed to same socket

Transport Layer 8

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

Transport Layer 10

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 3: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Transport services and protocols provide logical

communication between app processes running on different hosts

transport protocols run in end systems send side breaks app

messages into segments passes to network layer

rcv side reassembles segments into messages passes to app layer

more than one transport protocol available to apps Internet TCP and UDP

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

logical end-end transport

Transport Layer 3

Internet transport-layer protocols reliable in-order

delivery (TCP) congestion control flow control connection setup

unreliable unordered delivery UDP no-frills extension of

ldquobest-effortrdquo IP services not available

delay guarantees bandwidth guarantees

application

transportnetworkdata linkphysical network

data linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

application

transportnetworkdata linkphysical

logical end-end transport

Transport Layer 4

Transport vs network layer

network layer logical communication between hosts

transport layer logical communication between processes relies on enhances network layer services

Household analogy12 kids sending letters

to 12 kids processes = kids app messages =

letters in envelopes hosts = houses transport protocol =

Ann and Bill network-layer

protocol = postal service

Transport Layer 5

Multiplexingdemultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Transport Layer 6

How demultiplexing works host receives IP datagrams

each datagram has source IP address destination IP address

each datagram carries 1 transport-layer segment

each segment has source destination port number

host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Transport Layer 7

Connectionless demultiplexing Create sockets with port

numbers UDP socket identified by

two-tuple(dest IP address dest port number)

When host receives UDP segment checks destination port

number in segment directs UDP segment to

socket with that port number

IP datagrams with different source IP addresses andor source port numbers directed to same socket

Transport Layer 8

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

Transport Layer 10

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 4: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Internet transport-layer protocols reliable in-order

delivery (TCP) congestion control flow control connection setup

unreliable unordered delivery UDP no-frills extension of

ldquobest-effortrdquo IP services not available

delay guarantees bandwidth guarantees

application

transportnetworkdata linkphysical network

data linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

application

transportnetworkdata linkphysical

logical end-end transport

Transport Layer 4

Transport vs network layer

network layer logical communication between hosts

transport layer logical communication between processes relies on enhances network layer services

Household analogy12 kids sending letters

to 12 kids processes = kids app messages =

letters in envelopes hosts = houses transport protocol =

Ann and Bill network-layer

protocol = postal service

Transport Layer 5

Multiplexingdemultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Transport Layer 6

How demultiplexing works host receives IP datagrams

each datagram has source IP address destination IP address

each datagram carries 1 transport-layer segment

each segment has source destination port number

host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Transport Layer 7

Connectionless demultiplexing Create sockets with port

numbers UDP socket identified by

two-tuple(dest IP address dest port number)

When host receives UDP segment checks destination port

number in segment directs UDP segment to

socket with that port number

IP datagrams with different source IP addresses andor source port numbers directed to same socket

Transport Layer 8

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

Transport Layer 10

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 5: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Transport vs network layer

network layer logical communication between hosts

transport layer logical communication between processes relies on enhances network layer services

Household analogy12 kids sending letters

to 12 kids processes = kids app messages =

letters in envelopes hosts = houses transport protocol =

Ann and Bill network-layer

protocol = postal service

Transport Layer 5

Multiplexingdemultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Transport Layer 6

How demultiplexing works host receives IP datagrams

each datagram has source IP address destination IP address

each datagram carries 1 transport-layer segment

each segment has source destination port number

host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Transport Layer 7

Connectionless demultiplexing Create sockets with port

numbers UDP socket identified by

two-tuple(dest IP address dest port number)

When host receives UDP segment checks destination port

number in segment directs UDP segment to

socket with that port number

IP datagrams with different source IP addresses andor source port numbers directed to same socket

Transport Layer 8

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

Transport Layer 10

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 6: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Multiplexingdemultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv hostgathering data from multiplesockets enveloping data with header (later used for demultiplexing)

Multiplexing at send host

Transport Layer 6

How demultiplexing works host receives IP datagrams

each datagram has source IP address destination IP address

each datagram carries 1 transport-layer segment

each segment has source destination port number

host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Transport Layer 7

Connectionless demultiplexing Create sockets with port

numbers UDP socket identified by

two-tuple(dest IP address dest port number)

When host receives UDP segment checks destination port

number in segment directs UDP segment to

socket with that port number

IP datagrams with different source IP addresses andor source port numbers directed to same socket

Transport Layer 8

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

Transport Layer 10

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 7: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

How demultiplexing works host receives IP datagrams

each datagram has source IP address destination IP address

each datagram carries 1 transport-layer segment

each segment has source destination port number

host uses IP addresses amp port numbers to direct segment to appropriate socket

source port dest port

32 bits

applicationdata

(message)

other header fields

TCPUDP segment format

Transport Layer 7

Connectionless demultiplexing Create sockets with port

numbers UDP socket identified by

two-tuple(dest IP address dest port number)

When host receives UDP segment checks destination port

number in segment directs UDP segment to

socket with that port number

IP datagrams with different source IP addresses andor source port numbers directed to same socket

Transport Layer 8

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

Transport Layer 10

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 8: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Connectionless demultiplexing Create sockets with port

numbers UDP socket identified by

two-tuple(dest IP address dest port number)

When host receives UDP segment checks destination port

number in segment directs UDP segment to

socket with that port number

IP datagrams with different source IP addresses andor source port numbers directed to same socket

Transport Layer 8

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

Transport Layer 10

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 9: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Connectionless demux (cont)

DatagramSocket serverSocket = new DatagramSocket(6428)

ClientIPB

P2

client IP A

P1P1P3

serverIP C

SP 6428

DP 9157

SP 9157

DP 6428

SP 6428

DP 5775

SP 5775

DP 6428

SP provides ldquoreturn addressrdquo

Transport Layer 9

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

Transport Layer 10

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 10: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

UDP User Datagram Protocol [RFC 768]

Simplest Internet transport protocol

Each app Output produces exactly one UDP segment

ldquobest effortrdquo service UDP segments may be lost delivered out of order to

app connectionless

no handshaking between UDP sender receiver

each UDP segment handled independently of others

Why is there a UDP no connection

establishment (which can add delay)

simple no connection state at sender receiver

small segment header no congestion control

UDP can blast away as fast as desired

Transport Layer 10

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 11: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

UDP more

often used for streaming multimedia apps loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP add reliability at application layer application-specific

error recovery

source port dest port

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header and dataMinimum value is 8

Bytes

Transport Layer 11

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 12: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

UDP checksum

Sender treat segment contents

as sequence of 16-bit integers

checksum addition (1rsquos complement sum) of segment contents

sender puts checksum value into UDP checksum field

Receiver compute checksum of

received segment check if computed

checksum equals checksum field value NO - error detected YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Transport Layer 12

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 13: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Transport Layer 13

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 14: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Sockets Programming over UDP ndash use socket slides now

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 15: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Transmission Control Protocol

Principles of reliable communication TCP basic notations 3 way handshake TCP flow control congestion control

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 16: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 16

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 17: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 17

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 18: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Principles of Reliable data transfer important in app transport link layers top-10 list of important networking topics

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 18

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 19: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Reliable Data Transfer Stream stream jargon

Application Layer 19

A stream is a sequence of characters that flow into or out of a process

An input stream is attached to some input source for the process eg keyboard or socket

An output stream is attached to an output source eg monitor or socket

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 20: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Reliable Communication

20Transport Layer

state1

state2

event causing state transitionactions taken on state transition

eventactions

Terminology of a State Machine

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 21: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Reliable communication

First Model sender sends receiver receives

Is this enough When will it work When will it not work

Transport Layer 21

Sender sends one packet then waits for receiver response

stop and wait

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 22: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Reliable Communication

22Transport Layer

State1

state2

Send Data

Data available

Stop and Wait ndash Sender side

Received AckIn State 1

Sender can send data

In State 2 Sender can receive acknowledge packets

Discussion bull Why to send back an ack msgbull What happens if data is available at state

2

Wait

for data Wait

for ack

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 23: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

channel with bit errors and losses

underlying channel may flip bits in packet checksum to detect bit errors

underlying channel can also lose packets the question how to recover from errors

acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK

timeout sender retransmits pkt if doesnrsquot receive ack within timeout

new mechanisms in error detection receiver feedback control msg (ACK) rcvr-gtsender sender control timer to understand if to send again

Transport Layer 23

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 24: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Reliable Communication

24Transport Layer

State1

state2a

Send DataData available

Stop and Wait with errorslosses - sender

Ack receivedIn State 1

Receiver waits for data

Wait for data

Wait for ack packet or time out

state2b

timeout

Send D

ata

Discussion bull What does the sender need to do for

the retransmission

Process ack

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 25: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

This version has a fatal flaw

What happens if ACK corruptedlost sender doesnrsquot know what happened at receiver canrsquot just retransmit possible duplicate

Handling duplicates sender retransmits

current pkt if ACK garbled or didnrsquot arrive

sender adds sequence number to each pkt

receiver discards (doesnrsquot deliver up) duplicate pkt

receiver must specify seq of pkt being ACKed

Transport Layer 25

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 26: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

discussionSender seq added to pkt two seq rsquos (01)

will suffice Why must check if

received ACK corrupted

twice as many states state must

ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiver must check if received

packet is duplicate state indicates whether 0

or 1 is expected pkt seq receiver sends ACK for

last pkt received OK receiver must explicitly

include seq of pkt being ACKed

note receiver can not know if its last ACK received OK at sender

Transport Layer 26

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 27: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Stop amp wait in action

Transport Layer 27

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 28: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Stop amp wait in action

Transport Layer 28

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 29: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Performance of stop amp wait

Stop amp wait works but performance stinks ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

1KB pkt every 30 msec -gt 33kBsec thruput over 1 Gbps link

network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

Transport Layer 29

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 30: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

stop-and-wait operation

first packet bit transmitted t = 0

sender receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Transport Layer 30

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 31: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Pipelined protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pkts range of sequence numbers must be increased buffering at sender andor receiver

Two generic forms of pipelined protocols go-Back-N selective repeat

Transport Layer 31

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 32: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Pipelining increased utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

Transport Layer 32

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 33: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Pipelining Protocol

Transport Layer 33

Go-back-N big picture Sender can have up to N unacked packets in

pipeline Rcvr only sends cumulative acks

Doesnrsquot ack packet if therersquos a gap Sender has timer for oldest unacked packet

If timer expires retransmit all unacked packets

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 34: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Transport Layer 3-34

Go-Back-NSender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed

ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n) retransmit pkt n and all higher seq pkts in

window

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 35: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Go Back N

Transport Layer 35

ACK-only always send ACK for correctly-received pkt with highest in-order seq may generate duplicate ACKs need only remember expectedseqnum

out-of-order pkt discard (donrsquot buffer) -gt no receiver buffering Re-ACK pkt with highest in-order seq

Receiver

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 36: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Transport Layer 3-36

GBN inaction

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 37: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Transport Control Protocol

Enhanced GBN protocol

Segment structure reliable data transfer and data transfer

issues flow control connection management TCP congestion control

Transport Layer 37

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 38: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Overview RFCs 793 1122 1323 2018 2581

full duplex data bi-directional data flow

in same connection MSS maximum

segment size connection-oriented

handshaking (exchange of control msgs) initrsquos sender receiver state before data exchange

flow controlled sender will not

overwhelm receiver

point-to-point one sender one

receiver reliable in-order byte

steam no ldquomessage

boundariesrdquo pipelined

TCP congestion and flow control set window size

send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Transport Layer 38

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 39: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

Transport Layer 39

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 40: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP reliable data transfer

TCP creates reliable service on top of IPrsquos unreliable service

Pipelined segments Cumulative acks TCP uses single retransmission timer

The sequence number for a segment is the first byte-stream of the first byte in the segment

Transport Layer 40

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 41: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP sender eventsdata rcvd from app Create segment with

seq start timer if not

already running (think of timer as for oldest unacked segment)

expiration interval TimeOutInterval

timeout retransmit segment

that caused timeout restart timer Ack rcvd If acknowledges

previously unacked segments update what is known

to be acked start timer if there are

outstanding segments

Transport Layer 41

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 42: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP seq rsquos and ACKsSeq rsquos

byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs seq of next

byte expected from other side

cumulative ACKQ how receiver

handles out-of-order segments A TCP spec

doesnrsquot say - up to implementor

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Usertypes

lsquoCrsquo

host receivesACK

host ACKsreceipt of

lsquoCrsquo

time

Transport Layer 42

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 43: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP retransmission scenarios

Host A

Seq=100 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Transport Layer 43

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 44: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP retransmission scenarios (more)

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

Transport Layer 44

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 45: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Interactive data flow Overhead for each

packet 40 bytes (20 TCP header + 20 IP header) to a total of 160 bytes for sending and receiving lsquoCrsquo

If the receiver waits a while it can piggyback the data packet

Delayed ack Wait up to 500ms for next segment If no next segment send ACK

Should sender use delayed acks too[Stevens figure 193]

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43

Seq=43 ACK=80

Usertypes

lsquoCrsquo

host ACKsreceipt

of echoedlsquoCrsquo

host ACKsreceipt of

lsquoCrsquo

timesimple telnet scenario

Transport Layer 45

Seq=79 ACK=43 data = lsquoCrsquo

Host echoesback lsquoCrsquo

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 46: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Nagle Algorithm [RFC 896]

Quantifying overhead how much control bytes per data bytes with piggyback 2120-gt only 16 of the bits sent are data

LANs usually not congested so it might be okay

Small packets termed tinygrams over congested WAN ndash bad news

New data canrsquot be sent until outstanding data is acked

Small amounts of data are collected and sent in a single segment when ack arrives

Self clocking the faster the ack comes back the faster data is sent Slow links cause fewer segments to be sent

[Stevens 194]

Transport Layer 46

Naglersquos alg

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 47: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Transport Layer 47

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 48: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Fast Retransmit

Time-out period often relatively long long delay before resending lost packet

Detect lost segments via duplicate ACKs Sender often sends many segments back-to-back If segment is lost there will likely be many duplicate

ACKs

If sender receives 3 ACKs for the same data it supposes that segment after ACKed data was lost fast retransmit resend

segment before timer expires

Transport Layer 48

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 49: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Host A

tim

eout

Host B

time

X

resend 2nd segment

Figure 337 Resending a segment after triple duplicate ACKTransport Layer 49

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 50: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are currently not-yet-acknowledged segments) start timer else increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

Transport Layer 50

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 51: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Round Trip Time and TimeoutQ how to set TCP

timeout value longer than RTT

but RTT varies too short premature

timeout unnecessary

retransmissions too long slow

reaction to segment loss

Q how to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary want estimated RTT ldquosmootherrdquo average several recent

measurements not just current SampleRTT

Transport Layer 51

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 52: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value = 0125

Transport Layer 52

[Retransmission example in Stevens 211]

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 53: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Example RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Transport Layer 53

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 54: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus ldquosafety marginrdquo

large variation in EstimatedRTT -gt larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Transport Layer 54

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 55: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Flow Control

receive side of TCP connection has a receive buffer

speed-matching service matching the send rate to the receiving apprsquos drain rate app process may be

slow at reading from buffer

sender wonrsquot overflow

receiverrsquos buffer bytransmitting too

much too fast

flow control

Transport Layer 55

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 56: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Flow control how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive

buffer doesnrsquot overflow

Discuss [Stevens 201 206]

Transport Layer 56

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 57: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

initialize TCP variables seq s buffers flow control

info (eg RcvWindow) client connection

initiator Socket clientSocket = new

Socket(hostnameport

number) server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to server specifies initial seq no data

Step 2 server host receives SYN replies with SYNACK segment server allocates buffers specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

Transport Layer 57

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 58: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Delayed Duplicates Problem

A user asks for a connection Due to congestion the packet is caught in a traffic jam The user asks again for the connection Destination accepts 2nd connection request User sends info to dest Info gets caught in a traffic jam User sends info again Dest receives the info Connection is closed by both parties The original connection request and user info find their

way to the destination

Transport Layer 58

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 59: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Transport Layer 59

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 60: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 60

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 61: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Transport Layer 61

[Tanenbaum 633]

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 62: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Principles of Congestion Control

Congestion informally ldquotoo many sources sending too

much data too fast for network to handlerdquo different from flow control manifestations

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem

Transport Layer 62

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 63: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Causescosts of congestion scenario 1

two senders two receivers

one router infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Transport Layer 63

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 64: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Causescosts of congestion scenario 2

one router finite buffers sender retransmission of lost packet

finite shared output link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 64

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 65: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Causescosts of congestion scenario 2 always (goodput)

ldquoperfectrdquo retransmission only when loss

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquocostsrdquo of congestion more work (retrans) for given ldquogoodputrdquo unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Transport Layer 65

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 66: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Causescosts of congestion scenario 3 four senders multihop paths timeoutretransmit

lin

Q what happens as and increase

lin

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Transport Layer 66

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 67: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion when packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Transport Layer 67

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 68: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP congestion control additive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase CongWin by 1

MSS every RTT until loss detected multiplicative decrease cut CongWin in half

after loss

timecong

estio

n w

indo

w s

ize

Saw toothbehavior probing

for bandwidth

Transport Layer 68

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 69: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked CongWin Roughly

CongWin is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (CongWin) after loss event

three mechanisms AIMD slow start conservative after

timeout events

rate = CongWin

RTT Bytessec

Transport Layer 69

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 70: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Slow Start

When connection begins CongWin = 1 MSS Example MSS = 500

bytes amp RTT = 200 msec

initial rate = 20 kbps available bandwidth

may be gtgt MSSRTT desirable to quickly

ramp up to respectable rate

When connection begins increase rate exponentially fast until first loss event

Transport Layer 70

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 71: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Slow Start (more)

When connection begins increase rate exponentially until first loss event double CongWin every RTT done by incrementing CongWin for every ACK

received Summary initial rate is slow but ramps up

exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Transport Layer 71

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 72: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Refinement inferring loss

After 3 dup ACKs CongWin is cut in half window then grows

linearly But after timeout event

CongWin instead set to 1 MSS

window then grows exponentially

to a threshold then grows linearly

3 dup ACKs indicates

network capable of delivering some segments

timeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer 72

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 73: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Refinement

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation Variable Threshold At loss event Threshold

is set to 12 of CongWin just before loss event

Transport Layer 73

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 74: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Summary TCP Congestion Control

When CongWin is below Threshold sender in slow-start phase window grows exponentially

When CongWin is above Threshold sender is in congestion-avoidance phase window grows linearly

When a triple duplicate ACK occurs Threshold set to CongWin2 and CongWin set to Threshold

When timeout occurs Threshold set to CongWin2 and CongWin is set to 1 MSS

Transport Layer 74

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 75: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP sender congestion control

State Event TCP Sender Action Commentary

Slow Start (SS)

ACK receipt for previously unacked data

CongWin = CongWin + MSS If (CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

CongestionAvoidance (CA)

ACK receipt for previously unacked data

CongWin = CongWin+MSS (MSSCongWin)

Additive increase resulting in increase of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin2 CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin2 CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter slow start

SS or CA Duplicate ACK

Increment duplicate ACK count for segment being acked

CongWin and Threshold not changed

Transport Layer 75

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 76: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP throughput

Whatrsquos the average throughout of TCP as a function of window size and RTT Ignore slow start

Let W be the window size when loss occurs

When window is W throughput is WRTT Just after loss window drops to W2

throughput to W2RTT Average throughout 75 WRTT

Transport Layer 76

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 77: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

TCP Futures TCP over ldquolong fat pipesrdquo

Example 1500 byte segments 100ms RTT want 10 Gbps throughput

Requires window size W = 83333 in-flight segments

Throughput in terms of loss rate

L = 210-10 Wow New versions of TCP for high-speed

LRTT

MSS221

Transport Layer 77

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 78: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Transport Layer 78

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 79: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Why is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Transport Layer 79

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 80: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Fairness (more)

Fairness and UDP Multimedia apps

often do not use TCP do not want rate

throttled by congestion control

Instead use UDP pump audiovideo at

constant rate tolerate packet loss

Research area TCP friendly

Fairness and parallel TCP connections

nothing prevents app from opening parallel connections between 2 hosts

Web browsers do this Example link of rate R

supporting 9 connections new app asks for 1 TCP

gets rate R10 new app asks for 11 TCPs

gets R2

Transport Layer 80

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 81: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Chapter 3 Summary principles behind

transport layer services multiplexing

demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

After that leaving the

network ldquoedgerdquo (application transport layers)

into the network ldquocorerdquo

Transport Layer 81

Next Socket

programming over TCP

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 82: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Connection-oriented demux

TCP socket identified by 4-tuple source IP address source port number dest IP address dest port number

recv host uses all four values to direct segment to appropriate socket

Server host may support many simultaneous TCP sockets each socket identified

by its own 4-tuple Web servers have

different sockets for each connecting client non-persistent HTTP will

have different socket for each request

Transport Layer 82

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 83: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Connection-oriented demux (cont)

ClientIPB

P1

client IP A

P1P2P4

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P5 P6 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 83

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server
Page 84: Transport Layer By Ossi Mokryn, Based also on slides from: the Computer Networking: A Top Down Approach Featuring the Internet by Kurose and Ross

Connection-oriented demux Threaded Web Server

ClientIPB

P1

client IP A

P1P2

serverIP C

SP 9157

DP 80

SP 9157

DP 80

P4 P3

D-IPCS-IP A

D-IPC

S-IP B

SP 5775

DP 80

D-IPCS-IP B

Transport Layer 84

  • Transport Layer
  • Transport Layer (2)
  • Transport services and protocols
  • Internet transport-layer protocols
  • Transport vs network layer
  • Multiplexingdemultiplexing
  • How demultiplexing works
  • Connectionless demultiplexing
  • Connectionless demux (cont)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Sockets Programming over UDP ndash use socket slides now
  • Transmission Control Protocol
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable data transfer (3)
  • Reliable Data Transfer Stream stream jargon
  • Reliable Communication
  • Reliable communication
  • Reliable Communication (2)
  • channel with bit errors and losses
  • Reliable Communication (3)
  • This version has a fatal flaw
  • discussion
  • Stop amp wait in action
  • Stop amp wait in action (2)
  • Performance of stop amp wait
  • stop-and-wait operation
  • Pipelined protocols
  • Pipelining increased utilization
  • Pipelining Protocol
  • Go-Back-N
  • Go Back N
  • GBN in action
  • Transport Control Protocol
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP segment structure
  • TCP reliable data transfer
  • TCP sender events
  • TCP seq rsquos and ACKs
  • TCP retransmission scenarios
  • TCP retransmission scenarios (more)
  • Interactive data flow
  • Nagle Algorithm [RFC 896]
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Slide 49
  • Fast retransmit algorithm
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • Example RTT estimation
  • TCP Round Trip Time and Timeout (3)
  • TCP Flow Control
  • TCP Flow control how it works
  • TCP Connection Management
  • Delayed Duplicates Problem
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont)
  • Principles of Congestion Control
  • Causescosts of congestion scenario 1
  • Causescosts of congestion scenario 2
  • Causescosts of congestion scenario 2 (2)
  • Causescosts of congestion scenario 3
  • Causescosts of congestion scenario 3 (2)
  • TCP congestion control additive increase multiplicative decre
  • TCP Congestion Control details
  • TCP Slow Start
  • TCP Slow Start (more)
  • Refinement inferring loss
  • Refinement
  • Summary TCP Congestion Control
  • TCP sender congestion control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
  • Connection-oriented demux
  • Connection-oriented demux (cont)
  • Connection-oriented demux Threaded Web Server