advanced networks 20021 transport layer michalis faloutsos many slides from kurose-ross

35
Advanced Network s 2002 1 Transport Layer Michalis Faloutsos Michalis Faloutsos Many slides from Kurose-Ross Many slides from Kurose-Ross

Upload: antonio-rankin

Post on 16-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Advanced Networks 2002

1

Transport Layer

Michalis FaloutsosMichalis Faloutsos

Many slides from Kurose-RossMany slides from Kurose-Ross

Advanced Networks 2002

2

Transport Layer Functionality

Hide network from application layerHide network from application layer

Transport layer resides at end pointsTransport layer resides at end points

Sees the network as a black boxSees the network as a black box

socketdoor

TCPsend buffer

TCPreceive buffer

socketdoor

segment

applicationwrites data

applicationreads data

Advanced Networks 2002

3

Transport Layers of the Internet

TCP: reliable protocolTCP: reliable protocol• Guarantees end-to-end deliveryGuarantees end-to-end delivery• Self-controls rate: congestion and flow controlSelf-controls rate: congestion and flow control• Connection oriented: handshake, stateConnection oriented: handshake, state• Ordered delivery of packets to applicationOrdered delivery of packets to application

UDP: unreliable protocolUDP: unreliable protocol• Non-regulated sending rateNon-regulated sending rate• Multiplexing-demultiplexingMultiplexing-demultiplexing

Advanced Networks 2002

4

TCP overview

Advanced Networks 2002

5

TCP: What and How For more: RFCs: 793, 1122, 1323, 2018, 2581

full duplex data:full duplex data:• bi-directional data flow in bi-directional data flow in

same connectionsame connection• MSS: maximum segment MSS: maximum segment

sizesize

connection-oriented:connection-oriented: • handshaking (exchange handshaking (exchange

of control msgs) init’s of control msgs) init’s sender, receiver state sender, receiver state before data exchangebefore data exchange

flow controlled:flow controlled:• sender will not overwhelm sender will not overwhelm

receiverreceiver

point-to-point:point-to-point:• one sender, one receiverone sender, one receiver

reliable, in-order reliable, in-order byte byte steam:steam:• no “message boundaries”no “message boundaries”

pipelined:pipelined:• TCP congestion and flow TCP congestion and flow

control set window sizecontrol set window size

send & receive bufferssend & receive buffers

socketdoor

TCPsend buffer

TCPreceive buffer

socketdoor

segment

applicationwrites data

applicationreads data

Advanced Networks 2002

6

TCP segment structure

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

ptr urgent datachecksum

FSRPAUheadlen

notused

Options (variable length)

URG: urgent data (generally not used)

ACK: ACK #valid

PSH: push data now(generally not used)

RST, SYN, FIN:connection estab(setup, teardown

commands)

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

(as in UDP)

Advanced Networks 2002

7

TCP overview

TCP is a sliding window protocol• Sender can have (Window) bytes in flight

Operates with cumulative ACKsIt includes control for the sending rate• Flow control: receiver-set sending rate• Congestion control: network-aware sending

rate

Congwin

Advanced Networks 2002

8

TCP seq. #’s and ACKsSeq. #’s:Seq. #’s:

• byte stream byte stream “number” of first “number” of first byte in segment’s byte in segment’s datadata

ACKs:ACKs:• seq # of next byte seq # of next byte

expected from other expected from other sideside

• cumulative ACKcumulative ACKQ:Q: how receiver handles how receiver handles

out-of-order segmentsout-of-order segments• A: TCP spec doesn’t A: TCP spec doesn’t

say, - up to say, - up to implementorimplementor

Host A Host B

Seq=42, ACK=79, data = ‘C’

Seq=79, ACK=43, data = ‘C’

Seq=43, ACK=80

Usertypes

‘C’

host ACKsreceipt

of echoed‘C’

host ACKsreceipt of

‘C’, echoesback ‘C’

timesimple telnet scenario

Advanced Networks 2002

9

TCP in a nutshell

I. Slow start phase (actually this is fast increase)I. Slow start phase (actually this is fast increase)• Start with a window of 1 (or 2)Start with a window of 1 (or 2)• Successful ACK: Increase window by one 1 max size segmentSuccessful ACK: Increase window by one 1 max size segment• Do this up to a threshold: sshthreshDo this up to a threshold: sshthresh

II. Congestion control phaseII. Congestion control phase• Increase window by 1 max size segment every RTTIncrease window by 1 max size segment every RTT• Drop window in half, if there is congestionDrop window in half, if there is congestion

Packet loss: duplicate ACKsPacket loss: duplicate ACKs Time expirationTime expiration

Advanced Networks 2002

10

TCP Congestion Controlend-end control (no network assistance)end-end control (no network assistance)transmission rate limited by congestion window size, transmission rate limited by congestion window size, CongwinCongwin, over segments:, over segments:

w segments, each with MSS bytes sent in one RTT:

throughput = w * MSS

RTT Bytes/sec

Congwin

Advanced Networks 2002

11

TCP congestion control: Intuition

TCP is “TCP is “probing”probing” for usable bandwidth: for usable bandwidth:

ideally:ideally: transmit as fast as possible ( transmit as fast as possible (CongwinCongwin as large as possible) without lossas large as possible) without loss

increaseincrease CongwinCongwin until loss (congestion) until loss (congestion)

loss: loss: decreasedecrease CongwinCongwin, then begin probing , then begin probing (increasing) again(increasing) again

Advanced Networks 2002

12

TCP congestion control:

TCP has two “phases”TCP has two “phases”• slow start:slow start:

start from small, increase quicklystart from small, increase quickly• congestion avoidance: congestion avoidance:

Additive Increase Multiplicative DecreaseAdditive Increase Multiplicative Decrease

important variables:important variables:• CongwinCongwin• threshold:threshold: defines threshold between two slow defines threshold between two slow

start phase, congestion control phasestart phase, congestion control phase

Advanced Networks 2002

13

TCP Slowstart

exponential increase (per exponential increase (per RTT) in window size RTT) in window size loss event: timeout (Tahoe loss event: timeout (Tahoe TCP) and/or or three TCP) and/or or three duplicate ACKs (Reno TCP)duplicate ACKs (Reno TCP)

initialize: Congwin = 1for (each segment ACKed) Congwin++until (loss event OR CongWin > threshold)

Slowstart algorithmHost A

one segment

RTT

Host B

time

two segments

four segments

Advanced Networks 2002

14

Why Call it Slow Start ?

The original version of TCP suggested that the sender The original version of TCP suggested that the sender transmit as much as the Advertised Window permitted.transmit as much as the Advertised Window permitted.

Routers may not be able to cope with this “burst” of Routers may not be able to cope with this “burst” of transmissions.transmissions.

Slow start is slower than the above version -- ensures that a Slow start is slower than the above version -- ensures that a transmission burst does not happen at once.transmission burst does not happen at once.

Advanced Networks 2002

15

TCP Congestion Avoidance

/* slowstart is over */ /* Congwin > threshold */Until (loss event) { every w segments ACKed: Congwin++ }threshold = Congwin/2Congwin = 1perform slowstart

Congestion avoidance

1

1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs

Advanced Networks 2002

16

TCP Congestion: Real Life is Hairy!

Remember: bytes vs Remember: bytes vs packets!packets!

CW += MSS * MSS/CWCW += MSS * MSS/CW

Thres = Max( 2* MSS,Thres = Max( 2* MSS,

InFlightData/2)InFlightData/2)

MSS: max segment sizeMSS: max segment size

InFlighData: un-ACK-ed dataInFlighData: un-ACK-ed data

/* slowstart is over */ /* Congwin > threshold */Until (loss event) { every w segments ACKed: Congwin++ }threshold = Congwin/2Congwin = 1 perform slowstart

Congestion avoidance

1

RFC 2581: TCP Congestion Control

Advanced Networks 2002

17

Fairness goal:Fairness goal: if N TCP if N TCP sessions share same sessions share same bottleneck link, each bottleneck link, each should get 1/N of link should get 1/N of link capacitycapacity

TCP congestion TCP congestion avoidance:avoidance:

AIMD:AIMD: additive additive increase, increase, multiplicative multiplicative decreasedecrease• increase window by 1 increase window by 1

per RTTper RTT• decrease “window” by decrease “window” by

factor of 2 on loss factor of 2 on loss eventevent

TCP Fairness and AIMD

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Advanced Networks 2002

18

Why is TCP fair?

Two competing sessions:Two competing sessions:Additive increase gives slope of 1, as throughout increasesAdditive increase gives slope of 1, as throughout increases

multiplicative decrease decreases throughput proportionally multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance: additive increaseloss: decrease window by factor of 2

congestion avoidance: additive increaseloss: decrease window by factor of 2

Advanced Networks 2002

19

Macroscopic Description of Throughput

Assume window toggling: W/2 to WAssume window toggling: W/2 to W

High rate: W * MSS / RTTHigh rate: W * MSS / RTT

Low rate: W * MSS / 2 RTTLow rate: W * MSS / 2 RTT

Rate increase is linearly between two Rate increase is linearly between two extremesextremes

Average throughput:Average throughput:• 0.75 * W * MSS / RTT0.75 * W * MSS / RTT

Advanced Networks 2002

20

TCP: reliable data transfer

Simplified sender, assuming

waitfor

event

waitfor

event

event: data received from application above

event: timer timeout for segment with seq # y

event: ACK received,with ACK # y

create, send segment

retransmit segment

ACK processing

•one way data transfer•no flow, congestion control

Advanced Networks 2002

21

TCP sender00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) { 04 switch(event) 05 event: data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event: timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event: ACK received, with ACK field value of y 15 if (y > sendbase) { /* cumulative ACK of all data up to y */ 16 cancel all timers for segments with sequence numbers < y 17 sendbase = y 18 } 19 else { /* a duplicate ACK for already ACKed segment */ 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y == 3) { 22 /* TCP fast retransmit */ 23 resend segment with sequence number y 24 restart timer for segment y 25 } 26 } /* end of loop forever */

SimplifiedTCPsender

Advanced Networks 2002

22

TCP Receiver: ACK generation [RFC 1122, RFC 2581]

Event

in-order segment arrival, no gaps,everything else already ACKed

in-order segment arrival, no gaps,one delayed ACK pending

out-of-order segment arrivalhigher-than-expect seq. #gap detected

arrival of segment that partially or completely fills gap

TCP Receiver action

delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK

immediately send singlecumulative ACK

send duplicate ACK, indicating seq. #of next expected byte

immediate ACK if segment startsat lower end of gap

Advanced Networks 2002

23

TCP: retransmission scenarios

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92, 8 bytes data

ACK=100

Host A

Seq=100, 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeout,cumulative ACKs

Host B

Seq=92, 8 bytes data

ACK=120

Seq=92, 8 bytes data

Seq=

10

0 t

imeou

t

ACK=120

Advanced Networks 2002

24

TCP Round Trip Time and Timeout

Q:Q: how to set TCP how to set TCP timeout value?timeout value?longer than RTTlonger than RTT• note: RTT will varynote: RTT will vary

too short: premature too short: premature timeouttimeout• unnecessary unnecessary

retransmissionsretransmissions

too long: slow reaction too long: slow reaction to segment lossto segment loss

Q:Q: how to estimate RTT? how to estimate RTT?SampleRTTSampleRTT:: measured time from measured time from segment transmission until ACK segment transmission until ACK receiptreceipt• ignore retransmissions, ignore retransmissions,

cumulatively ACKed segmentscumulatively ACKed segments

SampleRTTSampleRTT will vary, want will vary, want estimated RTT “smoother”estimated RTT “smoother”• use several recent use several recent

measurements, not just current measurements, not just current SampleRTTSampleRTT

Advanced Networks 2002

25

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT

Exponential weighted moving averageinfluence of given sample decreases exponentially fasttypical value of x: 0.1

Setting the timeoutSetting the timeoutEstimtedRTTEstimtedRTT plus “safety margin” plus “safety margin”

large variation in large variation in EstimatedRTT ->EstimatedRTT -> larger safety margin larger safety margin

Timeout = EstimatedRTT + 4*Deviation

Deviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT|

Advanced Networks 2002

26

A problem

Sender Receiver

Original transmission

ACK

Retransmission

Sender Receiver

Original transmission

ACKRetransmission

(a) (b)

• When there are retransmissions, it is unclear if the ACK is for the original transmission or for a retransmission.

• How do we overcome this ?

Advanced Networks 2002

27

The Karn Patridge Algorithm

Take SampleRTT measurements only for segments that Take SampleRTT measurements only for segments that have been sent once !have been sent once !

This eliminates the possibility that wrong RTT estimates This eliminates the possibility that wrong RTT estimates are factored into the estimation.are factored into the estimation.

Another change -- Each time TCP retransmits, it sets the Another change -- Each time TCP retransmits, it sets the next timeout to 2 X Last timeout --> This is called the next timeout to 2 X Last timeout --> This is called the Exponential Back-off (primarily for avoiding congestion).Exponential Back-off (primarily for avoiding congestion).

Advanced Networks 2002

28

Jacobson Karels Algorithm

An issue with the Karn/Patridge scheme is that it does not An issue with the Karn/Patridge scheme is that it does not take into account the variation between RTT samples.take into account the variation between RTT samples.New method proposed -- the Jacobson Karels Algorithm.New method proposed -- the Jacobson Karels Algorithm.Estimated RTT = Estimated RTT + Estimated RTT = Estimated RTT + X Difference X Difference• Difference = Sample RTT - Estimated RTTDifference = Sample RTT - Estimated RTT

Deviation = Deviation + Deviation = Deviation + (|Difference| - deviation) (|Difference| - deviation)Timeout = Timeout = Estimated RTT + Estimated RTT + deviation. deviation.The values of The values of and and are computed based on experience -- are computed based on experience -- Typically Typically = 1 and = 1 and = 4. = 4.

Advanced Networks 2002

29

Silly Window Syndrome

Suppose a MSS worth of data is collected and advertised window Suppose a MSS worth of data is collected and advertised window is MSS/2.is MSS/2.

What should the sender do ? -- transmit half full segments or wait What should the sender do ? -- transmit half full segments or wait to send a full MSS when window opens ?to send a full MSS when window opens ?

Early implementations were aggressive -- transmit MSS/2.Early implementations were aggressive -- transmit MSS/2.

Aggressively doing this, would consistently result in small Aggressively doing this, would consistently result in small segment sizes -- called the Silly Window Syndrome.segment sizes -- called the Silly Window Syndrome.

Advanced Networks 2002

30

Issues ..

We cannot eliminate the possibility of small segments being We cannot eliminate the possibility of small segments being sent.sent.

However, we can introduce methods to coalesce small However, we can introduce methods to coalesce small chunks.chunks.• Delaying ACKs -- receiver does not send ACKs as soon as it Delaying ACKs -- receiver does not send ACKs as soon as it

receives segments.receives segments. How long to delay ? Not very clear.How long to delay ? Not very clear.

• Ultimate solution falls to the sender -- when should I transmit ?Ultimate solution falls to the sender -- when should I transmit ?

Advanced Networks 2002

31

Nagle’s Algorithm

If sender waits too long --> bad for interactive connections.If sender waits too long --> bad for interactive connections.

If it does not wait long enough -- silly window syndrome.If it does not wait long enough -- silly window syndrome.

How do we solve this?How do we solve this?

Timer -- clock basedTimer -- clock based• If both available data and Window ≥ MSS, send full segment.If both available data and Window ≥ MSS, send full segment.• Else, if there is unACKed data in flight, buffer new data until Else, if there is unACKed data in flight, buffer new data until

ACK returns.ACK returns.• Else, send new data now.Else, send new data now.

Note -- Socket interface allows some applications to turn off Note -- Socket interface allows some applications to turn off Nagle’s algorithm by setting the TCP-NODELAY option.Nagle’s algorithm by setting the TCP-NODELAY option.

Advanced Networks 2002

32

TCP Connection Management

Recall:Recall: TCP sender, receiver TCP sender, receiver establish “connection” establish “connection” before exchanging data before exchanging data segmentssegmentsinitialize TCP variables:initialize TCP variables:• seq. #sseq. #s• buffers, flow control info buffers, flow control info

(e.g. (e.g. RcvWindowRcvWindow))client:client: connection initiator connection initiator

Socket clientSocket = new Socket clientSocket = new Socket("hostname","port Socket("hostname","port

number");number"); server:server: contacted by client contacted by client

Socket connectionSocket = Socket connectionSocket = welcomeSocket.accept();welcomeSocket.accept();

Advanced Networks 2002

33

TCP Set-up

Three way handshake:Three way handshake:

Step 1:Step 1: client end system sends TCP SYN control segment to client end system sends TCP SYN control segment to serverserver• specifies initial seq #specifies initial seq #

Step 2:Step 2: server end system receives SYN, replies with SYNACK server end system receives SYN, replies with SYNACK control segmentcontrol segment

• ACKs received SYNACKs received SYN• allocates buffersallocates buffers• specifies server-> receiver initial seq. #specifies server-> receiver initial seq. #

Step 3:Step 3: Client replies with an ACK (using servers seq Client replies with an ACK (using servers seq number)number)

Advanced Networks 2002

34

TCP Connection Management (cont.)

Closing a connection:Closing a connection:

client closes socket:client closes socket: clientSocket.close();clientSocket.close();

Step 1:Step 1: clientclient end system sends end system sends TCP FIN control segment to serverTCP FIN control segment to server

Step 2:Step 2: serverserver receives FIN, replies receives FIN, replies with ACK. Closes connection, with ACK. Closes connection, sends FIN. sends FIN.

Last ACK is never ACK-ed!!Last ACK is never ACK-ed!!

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

Advanced Networks 2002

35

TCP Connection Management (cont.)

Step 3:Step 3: clientclient receives FIN, receives FIN, replies with ACK. replies with ACK.

• Enters “timed wait” - will Enters “timed wait” - will respond with ACK to respond with ACK to received FINs received FINs

Step 4:Step 4: serverserver, receives ACK. , receives ACK. Connection closed. Sends Connection closed. Sends FIN. FIN.

Last ACK is never ACK-edLast ACK is never ACK-ed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed