tcp end-to-end congestion control

80
TCP TCP End-To-End Congestion End-To-End Congestion Control Control Wanida Putthividhya Wanida Putthividhya Dept. of Computer Science Dept. of Computer Science Iowa State University Iowa State University Jan, 27 Jan, 27 th th 2002 2002 (May, 25 (May, 25 th th 2001) 2001)

Upload: jersey

Post on 11-Jan-2016

61 views

Category:

Documents


5 download

DESCRIPTION

TCP End-To-End Congestion Control. Wanida Putthividhya Dept. of Computer Science Iowa State University Jan, 27 th 2002 (May, 25 th 2001). Contents :. - TCP Congestion Control Concepts. - TCP Flavors. - Avoid ‘congestion collapses’ :. “ The severe drop of the network throughput caused - PowerPoint PPT Presentation

TRANSCRIPT

TCPTCPEnd-To-End Congestion End-To-End Congestion

ControlControl

Wanida PutthividhyaWanida Putthividhya

Dept. of Computer ScienceDept. of Computer Science

Iowa State UniversityIowa State University

Jan, 27Jan, 27thth 2002 2002

(May, 25(May, 25thth 2001) 2001)

Contents :

- TCP Congestion Control Concepts

- TCP Flavors

TCP Congestion Control

- Obey a ‘packet conservation’ principle :

“ In equilibrium, a new packet is not put into the network until an old packet leaves ”

- Avoid ‘congestion collapses’ :

“ The severe drop of the network throughput causedby the congestion ”

- A collection of collaborating mechanisms :

Slow-Start

Accurate Retransmission Timeout Estimation Congestion Avoidance

Fast Retransmit

Fast Recovery

Selective Acknowledgement

TCP Basics

- Congestion Window (cwnd) :

“ A TCP state variable that limits the amount ofdata a TCP can send”

“ The window at the sender site controlled bycongestion control and avoidance algorithms ”

- Advertised Window (Receiver Window) :

“ The available buffer size at the receiver site ”

- Sender’s maximum window (maxwin) :

“ min(cwnd, advertised window) ”

- Sender’s usable window :

“ maxwin - unacknowledged segments ”

- TCP maintains a Retransmission Timer for each packet, say x, which has been sent and not yet acknowledged.

If the ACK for the packet x does not reach the sender before its timer is expired, the packet x is assumed to be lost and the sender will retransmit the packet x.

Self-Clocking

- The ‘packet conservation’ property can be expressedin the sense that:

“ The sender will be able to inject a new data packetinto the network only if it receives an ‘ACK’ from the receiver “

So, the protocol is self-clocking !

“ The sender uses ACKs as a ‘clock’ to strobe newpackets into the network ”

- However, how is the clock started ?

The problem is :

“ An ACK is generated when the receiver receives adata packet correctly “ and“ To make the system robust, the data packet will be injected into the network only when there is an ACK triggering the sender to do so ”

- Answer:

“ A new algorithm called ‘Slow-Start’ has beenintroduced to gradually increase the amount ofdata in transit”

receiver

PrPb

Ab

sender

As Ar

Pb : the minimum packet spacing (the inter-packet interval) on the bottleneck linkPr : the receiver’s network packet spacing [Pb = Pr]Ar : the spacing between acks on the receiver’s network [if the processing time is the same for all packets, Pb = Pr = Ar] Ab : the ack spacing on the bottleneck linkAs : the ack spacing on the sender’s network [As = Pb]

Getting to Equilibrium: Slow-Start Algorithm

- When starting, initialize ‘cwnd’ to 1 When restarting after a loss, set ‘cwnd’ to 1 cwnd = 1

- Every time the sender sends data packets: min ( cwnd, advertised window) – # unacked paeket

- Upon receiving an ACK for new data, increase congestion window by one cwnd = cwnd + 1

1

one RTT

one pkt time

0R

21R

3

42R

567

83R

91011

1213

1415

1

2 3

4 5 6 7

- However, the slow-start is not that slow to increasethe congestion window of the sender site:

“ Let W be the window size (packets) Let RTT be the round-trip time it takes time RTT * log2W to open the congestion window from 1 to W ”

- Therefore, the window is increased fast enough to have negligible effect on performance

Conservation at equilibrium: round-trip timing

- Once data is flowing reliably, the problem that the sender injects a new packet before an old packet has exited must represent a failure of sender’s retransmission timer

- TCP decided to estimate the retransmission timer for each packet in term of RTT ( wait at least one RTT before retransmitting ! )

- too short RTT => unnecessary retransmission too long RTT => low throughput

- What model should be used to estimate the RTT ?

“ Estimated RTT must be adaptive due to the condition of the network, but not too fast and not too slow ”

- Initial RTO estimator:

New RTT = * old RTT + (1 - ) * M

where M : a round trip time measurement from the most recently acked data packet (Round Trip Sample) : a filter gain constant with suggested value of 0.9

RTO = * New RTT

where : accounts for RTT variation with suggested value of 2

- How to measure accurately Round Trip Samples?

A B

ACK

SampleRTT

A BOriginal transmission

retransmissionSampleRTT

Original transmission

retransmission

ACK

Acknowledgement Ambiguity phenomenon

Complication arises because TCP’s acknowledgementrefers to data received, not to the instance of aspecific datagram that carried the data

- Karn’s RTO estimator

Accounts for the Acknowledgement Ambiguity phenomenon

Combination of the initial RTO estimator and a timerback off strategy.

As usual, to compute an initial timeout value, use the formula :

New RTT = * old RTT + (1 - ) * M

RTO = * New RTT

If the timer expires and causes retransmission, TCP does not count RTT sample for that segment but keeps back-off the timeout on each retransmission by the formula :

until it can successfully transfer a segment

New RTO = * old RTO

The suggested value for is 2

- Jakobson’s RTO estimator

Key Observations:

At high load, there is a wide range of variationin delay

Queuing theory suggested that by using the formula

and limiting to the suggested value of 2, the RTO estimation can adapt to loads of at most30 %

RTO = * New RTT

DIFF = SAMPLE - old RTTSmoothed RTT = old RTT + * DIFFDEV = old DEV + * ( |DIFF| - old DEV )Timeout = Smoothed RTT + * DEV

Solutions:

Estimate both average round trip time and the variance, and use the estimated variance in place ofthe constant

where DEV : the estimated mean deviation : a fraction between o and 1 that controls how quickly the new sample affects the weighted average (Smoothed RTT) : a fraction between o and 1 that controls how quickly the new sample affects the mean deviation : a factor that controls how much the deviation affects the RTO (suggested value of is 4)

Adapting to the path: Congestion Avoidance

- Use coarse grained timeout to indicate congestion in the network

- If loss occurs (timeout) when cwnd = W The network can absorb up to W segments

Set cwnd to 0.5 * W (multiplicative decrease)

- Upon receiving an ACK, Increase cwnd by 1/cwnd (additive increase)

Review:Congestion control algorithms must obey the “ Packet Conservation Principle ”.

* to get to the equilibrium state, to get high utilization of the network BW, but not want to bomb the network with a big burst,

USE ‘SLOWSTART’ algorithm

* to maintain the equilibrium state (not inject a new packet into the network until an old packet has been taken out),

USE an unambiguous situation to measure RTT (Karn’s algorithm) & USE an accurate model to calculate RTO (Jacobson’s model)

* to adapt to the network condition,

USE a mechanism to detect occurring of loss (coarse-grained timeout) USE congestion avoidance to avoid exceeding the available BW

The combined slow-start with congestionavoidance algorithm

- Use 2 state variables :cwnd : the congestion window at the sender site

ssthresh : the threshold used to switch between the two algorithms

- The sender always sends min(cwnd, advertised window) - # unacked packet

- If a packet is dropped, we loss self-clocking

- We need to implement both algorithms together to avoid loosing a packet as much as we can.

- The algorithm starts with slow-start; on a timeout, ssthresh = cwnd/2

cwnd = 1

- Now, upon receiving an ACK

if (cwnd < ssthresh) cwnd += 1 ; /* implement slow-start */ else

cwnd += 1/cwnd ; /* implement congestion avoidance */

Slow-Start and Congestion AvoidanceSlow-Start and Congestion Avoidance

SENDER RECEIVER

PKT#0

ACK #0, wait for #1

#1#2

#3#4#5#6

#7#8#9#10

#11#12#13#14

(1)

(2)

(4)

(8)

(14)ACK #12

dup ACK #12

. . .

SENDER RECEIVER

#15

#26

#16

. . .Timeout

ssthresh = 15/2 = 7 ( cwnd = 1 ) “start slow-start again”

Retx #13

ACK #26, wait for #27

#27#28

(2)

(4)

SENDER RECEIVERSENDER RECEIVER

(7)

#29#30#31#32

“enter congestion avoidance”

#33#34#35#36#37#38#39

(8)

#40#41#42#43#44#45#46

(8.125)

#47. . .

Timeout ssthresh = 8/2 = 4 ( cwnd = 1 ) “start slow-start again”

Retx #41

ACK #47, wait for #48

#48#49

(2)

(4)“enter congestion avoidance”

. . .

The congestion window for The congestion window for slow-start/congestion avoidance algorithmslow-start/congestion avoidance algorithm

time

Congestionwindow

1

W1

0.5 W1

W2

0.5 W2

Timeout Timeout

Impacts of timeout

- Timeout can cause sender to: Slow-start Retransmit a portion of window (possibly large)

- Employ duplicate ACKs to signal the sender

Fast Retransmit : use a number of duplicate ACKs tosignal the sender about the packet loss (shorten theidle time for waiting for the timeout)

Fast Recovery : advance congestion window moreaggressively to reach high utilization faster

Fast Retransmit

- Duplicate ACKs can be caused by:

Segment Dropped

Segment Re-ordering

- TCP receiver should send an immediate duplicate ACK when an out-of-order segment arrives

- TCP receiver should send an immediate ACK when an incoming segment fills in all or part of a gap in the sequence space.

- Assume that segment re-ordering is infrequent,

TCP sender uses receipt of 3 duplicate ACKs asan indication of a segment has been lost

“3 duplicate ACKs” means 4 identical ACKs withoutthe arrival of any other intervening ACK packets

Set ssthresh = 0.5 * current cwnd, cwnd = 1, and retransmit the dropped segment before timeout

Wait for a non-duplicate ACK and continue with slow-start

- Fast Retransmit removes the idle time the sender waits for the coarse grained timeout, since the sender can retransmit the dropped segment upon receiving the third duplicate ACK

- However, the throughput of the system is still suffered from the fact that the sender has to enter slow-start every time a retransmission occurs

- Moreover, Fast Retransmit causes unnecessary retransmission when multiple drops in a single window occur

Fast Recovery

- Key Observation:

A duplicate ACK is caused by a receipt of a segment at the receiver site

In another word, each duplicate ACK corresponds totaking one segment out of the network

So, it is possible to use the duplicate ACKs to clockthe sending of segments

- Solution:

If n duplicate ACKs arrive at the sender, advancecwnd by n

Fast Retransmit & Fast Recovery

- Upon receiving the third duplicate ACK of segment X,

Retransmit segment N (Fast Retransmit)

Set ssthresh = 0.5 * current cwnd

Set cwnd = ssthresh + 3 (Fast Recovery)

- After that, upon receiving a duplicate ACK, inflate the congestion window by one

- If the sender’s usable window allows, send new data segment

- Upon receiving a non-duplicate ACK, exit Fast Recovery

Set cwnd = ssthresh (the value in step 1)and continue with congestion avoidance

- Fast Recovery helps enhancing the throughput of the system reasonably since duplicate ACKs are used to clock sending(s)

- However, it is suffered a lot if multiple drops in a single window occur. The throughput is dramatically dropped especially when there are 3 non-consecutive drops in a window

Modified Fast Recovery (Conservative version)

- Key Observation:

Fast Recovery is suffered from multiple drops sinceit has to enter Fast Recovery several times

- Solution:

Change the sender’s behavior during Fast Recoverywhen a partial ACK is received

A partial ACK is the one that acknowledges some butnot all of the segments that were outstanding at thestart of the Fast Recovery period

In the original Fast Recovery, partial ACKs causeTCP sender to exit Fast Recovery by deflating thecongestion window back to the size of ssthresh

In the modified Fast Recovery, partial ACKs do nottake TCP sender out of Fast Recovery

Instead, partial ACKs received during Fast Recoverytrigger the sender to retransmit the segment immediately following the acknowledged segment

TCP sender remains in Fast Recovery until all of thedata outstanding when Fast Recovery was initiatedhas been acknowledged

Selective Acknowledgement (SACK)

- TCP receiver provides more information about hole(s) in the sequence buffer to the sender

- The SACK option field contains a number of SACK blocks, where each SACK block reports a non-contiguous set of data that has been received and queued.

The 1st block is required to report the most recentlyreceived segment

The additional SACK blocks repeat the most recentlyreported SACK blocks

- The minimum number of SACK blocks in the SACK option field is two. It can have more than two blocks depending on the other option fields implemented in TCP.

- The simulation referenced by this presentation used assumed to have three blocks in the SACK option field

- SACK TCP Sender enters Fast Recovery upon receiving 3rd duplicate ACK of a certain segment. Like the regular Fast Recovery, the sender cuts cwnd are cut in half and retransmit the dropped segment

- During Fast Recovery, SACK maintains a variable, named ‘pipe’, representing the estimated number of segments outstanding in the path

- The sender also maintains a data structure, called ‘scoreboard’ , which remembers acknowledgements from previous SACK options

- The sender only sends new or retransmitted data when “pipe < cwnd”

- ‘pipe’ is incremented by one when the sender either sends a new segment or retransmits an old packet

- ‘pipe’ is decremented by one when the sender receives a dup ACK packet with a SACK option reporting that new data has been received at the receiver

- Upon receiving a partial ACK, ‘pipe’ is decremented by two

- The sender exits Fast Recovery when it receives a recovery acknowledgement acknowledging all data that was outstanding when it enters Fast Recovery

- When the sender is allowed to send a segment,

It retransmits the next segment inferred to be missing

If no such segments and the advertised window issufficiently large, the sender sends a new packet

- When the retransmitted packet is itself dropped, the TCP sender detects drop with RTO, retransmits the dropped segment and then slow-starts.

TCP Flavors

- Tahoe, Reno, New-Reno, Vegas

- TCP Tahoe (distributed with 4.3 BSD Unix) includes:

Slow-start (exponential increase congestion window)

Congestion Avoidance (additive increase)

Fast Retransmit (use 3 dup ACKs)

- TCP Reno (1990) includes :

All mechanisms in Tahoe

Fast Recovery ( governing the transmission afterretransmit the lost segment )

Delayed Acknowledgement ( to avoid silly windowsyndrome )

- TCP New Reno :

Makes a small change in responding to partial ACKs during Fast Recovery

Tahoe: 1 dropTahoe: 1 drop

SENDER RECEIVER

#0(1)

(2)#1#2

ACK #1 - #2(4)

#3#4#5#6

ACK #3 - #6(8)

#7#8#9#10#11#12#13#14

ACK #7 - #13(15)

SENDER RECEIVER

#15#16#17#18

#26#27#28

. . .

3 dup ACKs #13“enter fastretransmit”ssthresh = 15/2 = 7(cwnd = 1)“continue withslow-start”

. . .

14th dup ACK #13

Retx #13

ACK #28

(2) #29#30

ACK #29 - #30(4)

#31#32#33#34

ACK #31 - #34(7)

SENDER RECEIVER

“enter congestion avoidance” #35

#36#37#38#39#40#41

ACK #35 - #41(8) . . .

Reno : 1 dropReno : 1 dropSENDER RECEIVER

#0

#1#2

ACK #1 - #2

(1)

(2)

(4)

#3#4#5#6

ACK #3 - #6(8)

#7#8#9#10#11#12#13#14

ACK #7 - #13(15)

SENDER RECEIVER

#15#16#17#18

#26#27#28

. . .

ACK #28

3 dup ACKs #13“enter fastrecovery”ssthresh = 15/2 = 7(cwnd = 7)

4th dup ACK #13

(11) 5th dup ACK #13

6th dup ACK #13

7th dup ACK #13

8th dup ACK #13

9th dup ACK #13

(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)

10th dup ACK #13

11th dup ACK #13

12th dup ACK #13

13th dup ACK #13

14th dup ACK #13

#29#30

#31#32#33#34

“exit fastrecovery”ssthresh = 7(cwnd = 7)continue with congestion avoidance !

#35

SENDER RECEIVER

ACK #29 - #35(8)

#36#37#38#39#40#41#42#43

ACK #36 - #43(9) . . .

Tahoe: 2 drops Tahoe: 2 drops

SENDER RECEIVER

#0(1)

(2)#1#2

ACK #1 - #2(4)

#3#4#5#6

ACK #3 - #6(8)

#7#8#9#10#11#12#13#14

SENDER RECEIVER

“enter fastretransmit”ssthresh = 8/2 = 4(cwnd = 1)continue withslow-start

3 dup ACKs #6

6th dup ACK #13

. . .Retx #7

ACK #8

(2)#9 (retx)

ACK #14

#15#16

. . .

#10

1st dup ACK #14(3)

#17

ACK #15 - #17(4.67)“enter

congestion avoidance”

Reno : 2 drops (causing “retransmission timeout”)Reno : 2 drops (causing “retransmission timeout”)

SENDER RECEIVERSENDER RECEIVER

#0(1)

(2)#1#2

ACK #1 - #2(4)

#3#4#5#6

ACK #3 - #6(8)

#7#8#9#10#11#12#13#14

SENDER RECEIVER

“enter fastrecovery”ssthresh = 8/2 = 4(cwnd = 4)

3 dup ACKs #6

6th dup ACK #6 Retx #7

ACK #8#15 #16

1st dup ACK #8

#17#18

ACK #17 - #18(4)

(10)“exit fastrecovery”ssthresh = 4(cwnd = 4)cannot send moredata since theoutstanding no.of segments is 8 2nd dup ACK #8

. . .Timeout

Retx #9

“enter slow-start”

(cwnd = 1)

ACK #16

(2)

4th dup ACK #6

(8) 5th dup ACK #6

(9)

Reno : 2 drops (causing “two successive Fast Recovery”)Reno : 2 drops (causing “two successive Fast Recovery”)

SENDER RECEIVER

#0

#1#2

ACK #1 - #2

(1)

(2)

(4)

#3#4#5#6

ACK #3 - #6(8)

#7#8#9#10#11#12#13#14

ACK #7 - #13(15)

SENDER RECEIVER

#15#16#17#18

#26#27#28

. . .

3 dup ACKs #13“enter fastrecovery”ssthresh = 15/2 = 7(cwnd = 7)

4th dup ACK #13

(11) 5th dup ACK #13

6th dup ACK #13

7th dup ACK #13

8th dup ACK #13

9th dup ACK #13

(12)(13)(14)(15)(16)(17)(18)(19)(20)

10th dup ACK #13

11th dup ACK #13

12th dup ACK #13

13th dup ACK #13

Retx#14

#29#30#31#32

#33

ACK#27

“exit fastrecovery”ssthresh = 7(cwnd = 7) #34

SENDER RECEIVER

3 dup ACKs #27

4th dup ACK #27

5th dup ACK #27

“enter fastrecovery”ssthresh = 7/2 = 3(cwnd = 3) (7)

(8) 6th dup ACK #27

(9)

#35 #36

Retx#28

ACK#34“exit fastrecovery”ssthresh = 3(cwnd = 3)continue withcongestionavoidance

ACK#35#37

#38ACK#36

#39ACK#37

(4) #40#41

#42

ACK#38

ACK#39

#43ACK#40

#44ACK#41

#45(5)

New Reno : 2 dropsNew Reno : 2 drops

SENDER RECEIVER

#0

#1#2

ACK #1 - #2

(1)

(2)

(4)

#3#4#5#6

ACK #3 - #6(8)

#7#8#9#10#11#12#13#14

ACK #7 - #13(15)

SENDER RECEIVER

#15#16#17#18

#26#27#28

. . .

3 dup ACKs #13“enter fastrecovery”ssthresh = 15/2 = 7(cwnd = 7)

4th dup ACK #13

(11) 5th dup ACK #13

6th dup ACK #13

7th dup ACK #13

8th dup ACK #13

9th dup ACK #13

(12)(13)(14)(15)(16)(17)(18)(19)(20)

10th dup ACK #13

11th dup ACK #13

12th dup ACK #13

13th dup ACK #13

Retx#14

#29#30#31#32

#33

ACK#27“receive a partialACK; retransmitsegment#28 immediately”

SENDER RECEIVER

Retx#28(7)

(8)(9)(10)(11)(12)

5 dup ACKs #27#34

#35#36

#37#38#39

ACK#33“exit fastrecovery”ssthresh = 7(cwnd = 7)continue withcongestionavoidance

SACK TCP : 2 dropsSACK TCP : 2 drops

SENDER RECEIVER

#0

#1#2

ACK #1 - #2

(1)

(2)

(4)

#3#4#5#6

ACK #3 - #6(8)

#7#8#9#10#11#12#13#14

ACK #7 - #13(15)

SENDER RECEIVER

#15#16#17#18

#26#27#28

. . .

3 dup ACKs #13

“enter Fast Recovery”pipe = cwnd - ndup = 15 - 3 = 12ssthresh = 15/2 = 7cwnd = 7 4th dup ACK #13

5th dup ACK #13

6th dup ACK #13

7th dup ACK #13

8th dup ACK #13

9th dup ACK #13

10th dup ACK #13

11th dup ACK #13

12th dup ACK #13

13th dup ACK #13

(7, 11) (7, 10)(7, 9)(7, 8)(7, 7)(7, 6)(7, 5)(7, 4)(7, 3)(7, 2)

Retx#14

Can send fivemore segments

SENDER RECEIVER

ACK#27

#29#30#31#32#33

(7, 3)(7, 4)(7, 5)(7, 6)(7, 7)

#34#35

(7, 5)(7, 6)(7, 7)

5 dup ACKs #27(7, 6)(7, 5)(7, 4)(7, 3)

(7, 2)Retx#28#36

#37#38#39

(7, 7)

2 dup ACKs #27

(7, 6)

(7, 5)#40#41

(7, 7)

ACK#35

“exit fastrecovery”ssthresh = 7(cwnd = 7)continue with congestion avoidance

SENDER RECEIVER

#42

ACK#36

ACK#37

ACK#38

ACK#39

#43#44#45#46

ACK#40

ACK#41

#47#48

ACK#42

#49(8)

#50#51#52#53#54

ACK#43

ACK#44

ACK#45

ACK#46

Example: SACK 2 drops (#14 and #28)

At sender:

Receive ACK# 7 No Gap7 0-6

ACK# 8 No Gap8 0-7

ACK# 9 No Gap9 0-8

ACK#10 No Gap10 0-9

ACK#11 No Gap11 0-10

ACK#12 No Gap12 0-11

ACK#13 No Gap13 0-12

1st dup a hole at #14ACK#13

15 0-13

2nd dup a hole at #14ACK#13

16 0-13 15

3rd dup a hole at #14ACK#13

17 0-13 15-16 *** Enter Fast Recovery ! ssthresh = cwnd = 15/2 = 7 outstanding segment = #14 - #28 Retransmit #14

13th dup a hole at #14ACK#13

27 0-13 15-26

. . .

ACK#27 No Gap *** The first partial ACK is caused by retransmitted segment #14 ‘pipe’ is decremented by two

27 0-27

1st dup a hole at #28ACK#27

29 0-27

2nd dup a hole at #28ACK#27

30 0-27 29

3rd dup a hole at #28ACK#13

31 0-27 29-30

4th dup a hole at #28ACK#13

32 0-27 29-31

7th dup a hole at #28ACK#13

35 0-27 29-34

. . .

Retransmit #28

ACK#35 No Gap *** The recovery ACK is caused by retransmitted segment #28 It brings TCP sender out of Fast Recovery

*** Exit Fast Recovery ! ssthresh = cwnd = 15/2 = 7 continue with congestion avoidance

35 0-34

Tahoe: 3 drops Tahoe: 3 drops

SENDER RECEIVER

#0

#1#2

ACK #1 - #2

(1)

(2)

(4)

#3#4#5#6

ACK #3 - #6(8)

#7#8#9#10#11#12#13#14

ACK #7 - #13(15)

#15#16#17#18

#26#27#28

. . .

SENDER RECEIVER

“enter fast retransmit”ssthresh = 15/2 = 7(cwnd = 1) continue with slow-start

12th dup ACK #13

Retx#14

ACK #25

#26 (retx)#27

(2)

(3)

ACK #27

1st dup ACK #27

#28 (retx)#29#30 ACK #28

ACK #29

ACK #30(4)(5)(6)

SENDER RECEIVER

#31#32#33#34#35#36

ACK #31 - #36(7)

#37#38#39#40#41#42

“enter congestionavoidance”

Reno : 3 drops Reno : 3 drops SENDER RECEIVER

#0

#1#2

ACK #1 - #2

(1)

(2)

(4)

#3#4#5#6

ACK #3 - #6(8)

#7#8#9#10#11#12#13#14

ACK #7 - #13(15)

SENDER RECEIVER

#15#16#17#18

#26#27#28

. . .

3 dup ACKs #13“enter fastrecovery”ssthresh = 15/2 = 7(cwnd = 7)

4th dup ACK #13

(11) 5th dup ACK #13

6th dup ACK #13

7th dup ACK #13

8th dup ACK #13

9th dup ACK #13

(12)(13)(14)(15)(16)(17)(18)(19)

10th dup ACK #13

11th dup ACK #13

12th dup ACK #13

Retx#14

#29#30#31#32

ACK#25“exit fast recovery”ssthresh = 7(cwnd = 7)continue withcongestion avoidance

SENDER RECEIVER

3 dup ACKs #25

“enter fastrecovery”ssthresh = 7/2 = 3(cwnd = 3)

4th dup ACK #25

(7) Retx#26

ACK#27“exit fast recovery” ssthresh = 3 (cwnd = 3)continue withcongestion avoidance

. . .

Timeout“enterslow-start”(cwnd = 1)

Retx#28

ACK#32(2) #33

#34

ACK#33 - #34(3)continue withcongestion avoidance

New Reno : 3 drops New Reno : 3 drops SENDER RECEIVER

#0

#1#2

ACK #1 - #2

(1)

(2)

(4)

#3#4#5#6

ACK #3 - #6(8)

#7#8#9#10#11#12#13#14

ACK #7 - #13(15)

SENDER RECEIVER

#15#16#17#18

#26#27#28

. . .

3 dup ACKs #13“enter fastrecovery”ssthresh = 15/2 = 7(cwnd = 7)

4th dup ACK #13

(11) 5th dup ACK #13

6th dup ACK #13

7th dup ACK #13

8th dup ACK #13

9th dup ACK #13

(12)(13)(14)(15)(16)(17)(18)(19)

10th dup ACK #13

11th dup ACK #13

12th dup ACK #13

Retx#14

#29#30#31#32

ACK#25“receive a partial Acknowledgement”retransmit #26immediately

SENDER RECEIVER

Retx#26(7)

(8)(9)

(10)(11) #33

#34#35#36

4 dup ACKs #25

“receive a partial Acknowledgement”retransmit #28immediately

(7)4 dup ACKs #27

(8)(9)

(10)(11) #37

#38

Retx#28

ACK#27

ACK#36“exit fastrecovery”ssthresh = 7(cwnd = 7)

#39#40#41#42#43

SACK TCP : 3 drops SACK TCP : 3 drops SENDER RECEIVER

#0

#1#2

ACK #1 - #2

(1)

(2)

(4)

#3#4#5#6

ACK #3 - #6(8)

#7#8#9#10#11#12#13#14

ACK #7 - #13(15)

SENDER RECEIVER

#15#16#17#18

#26#27#28

. . .

“enter Fast Recovery”pipe = cwnd - ndup = 15 - 3 = 12ssthresh = 15/2 = 7cwnd = 7 4th dup ACK #13

5th dup ACK #13

3 dup ACKs #13

6th dup ACK #13

7th dup ACK #13

8th dup ACK #13

9th dup ACK #13

10th dup ACK #13

11th dup ACK #13

12th dup ACK #13

(7, 11) (7, 10)(7, 9)(7, 8)(7, 7)(7, 6)(7, 5)(7, 4)(7, 3)

Retx#14

Realize that #26has been lost, andright now we cansend 4 segments

SENDER RECEIVER

#29#30#31

Retx #26

(7, 4)(7, 5)(7, 6)

(7, 7)

ACK #25

The first partialACK (7, 5) #32

#33 (7, 7)

3 dup ACKs #25

(7, 6) (7, 5) (7, 4)

These 3 dup ACKscontain informationindicating holesat segment #26 and#28.

Retx #28#34#35

ACK #27

The second partialACK

(7, 5)

(7, 7)

#36#37(7, 7)

(7, 5)

2 dup ACKs #27

#38#39

ACK #33

“exit Fast Recovery” ssthresh = 7 cwnd = 7continue with congestion avoidance

- TCP Vegas (1995) implements 3 new techniques to increase throughput and decrease losses :

New retransmission mechanism

Congestion avoidance mechanism

Modified Slow-Start mechanism

to avoid packet losses while trying to find the available bandwidth during the initial use of slow-start

give TCP the ability to anticipate congestion, andadjust its transmission rate accordingly

Results in a more timely decision to retransmita dropped segment

TCP Vegas New Retransmission Mechanism TCP Vegas New Retransmission Mechanism

- Vegas reads and records the system clock each time a segment is sent

- When an ACK arrives, Vegas reads the arriving time again and does the RTT calculation

RTT = Segment sending time - ACK arriving time

Goals: 1. To be able to detect lost segments even though there may be no second or third duplicate ACK

2. To reduce the time to detect lost segments ( can retransmit before receiving the third duplicate ACK )

When a duplicate ACK #n is received,

Vegas checks the difference between the currenttime and the sending time of the segment #n+1. If it is greater than the timeout value, Vegas retransmits the segment #n+1 without having to wait for 3 duplicate ACKS

When a non-duplicate ACK #n is received and it isthe first or second one after a retransmission,

Vegas checks the difference between the currenttime and the sending time of segment #n+1. If itis greater than the timeout value, Vegas retransmits segment #n+1 without having to wait for 3 duplicate ACKS

- Vegas then uses this more accurate RTT estimate to decide to retransmit in the following two situations :

- In addition to being able to detect lost segment sooner than the original TCP Reno,

the congestion window in TCP Vegas is decreased due to only losses that happened at the current sending rate, and not due to losses that happenedat an earlier, higher rate

- This concept is also implemented in TCP New-Reno where any partial ACKs do not bring TCP sender out of Fast Recovery

Vegas : retransmit mechanism (diagram) Vegas : retransmit mechanism (diagram) SENDER RECEIVER

#0

#1#2

#3#4#5#6

#7#8#9#10#11#12#13#14

ACK#7

ACK#8

ACK#9

ACK#10

ACK#11

ACK#12

#151 R

TT

1st dup ACK#12

: ACK#13 is expected

: This is the 1st dup ACK #12 (due to #14) Vegas checks the sending time of the segment #13 and decides to retransmit it. The congestion window is also reduced by half 14/2 = 7

Retx #13

ACK#14

: This is the 1st ACK after retransmission Vegas checks timestamp of #15 and decides to retransmit it The congestion window is not reduced by half since the loss happens before the last window decreases ( we know because it is a partial ACK ). Such a loss does not imply that the network is congested for the current congestion window size, and therefore, does not imply that it should be decreased again

1 R

TT

: ACK#15 is expected

TCP Vegas Congestion Avoidance Mechanism TCP Vegas Congestion Avoidance Mechanism

- It uses the loss of segments as a signal of network congestion

- It is reactive, rather than proactive since it cannot detect the incipient stage of congestion and prevent it (before losses occur)

Review of TCP Reno’s congestion detection and controlmechanism :

- As a result, Reno needs to create losses to find the available bandwidth of the connection

Several proactive algorithms :

- Based on the fact that as the network approaches congestion, the queue size in intermediate node is increased, resulting in increasing of the RTT for each successive segment :

Wang and Crowcroft’s DUAL algorithm

Jain’s CARD (Congestion Avoidance usingRound-Trip Delay)

- Based on the fact that as the network approaches congestion, the sending rate is flattening :

Wang and Crowcroft’s Tri-S scheme

Vegas’s congestion avoidance actions :

- Generally, Vegas measures and controls the right amount of extra data the connection has in transit

- Extra data mean data that would not have been sent if the bandwidth used by the connection exactly matched the available bandwidth of the network

- Too much extra data : congestion Too little extra data : cannot respond rapidly enough to transient increases in the available network bandwidth

- Based on changes in the estimated amount of extra data in the network, not only dropped segments

- BaseRTT mean the RTT of a segment when the connection is not congested

In practice, Vegas sets BaseRTT to the minimumof all measured round trip times

- Assumed that the connection is not overflowing, the Expected throughput can be given by :

Expected = WindowSize / BaseRTT

where WindowSize is the size of the current congestion window (assumed to be the number of bytes in transit)

- Once per round-trip time, Actual sending rate is calculated :

Computing the RTT for the distinguished segmentwhen its acknowledgement arrives, and dividing the number of bytes transmitted by the sampleRTT

- Compare Actual to Expected :

Diff = Expected - Actual

Since, Expected >= Actual (from the definition), Diff is positive or zero

- Define two thresholds : ( in terms of KB/s )

- Both thresholds represent the lower bound and the upper bound of extra data for a connection

- In practice, during congestion avoidance, we express the two thresholds in terms of buffers rather than extra bytes in transit

is set to 1 is set to 3

These values can be interpreted as : TCP sender should try to use at least one extra buffer at the bottleneck router, but no more than three extra buffers

Diff : leaves the congestion window unchanged

Diff : decreases the congestion window linearly during the next RTT

The farther away the actual throughput gets from the expected throughput, the morecongestion there is in the network

- Diff : increases the congestion window linearly during the next RTT

The closer the actual throughput and the expected throughput, the more the network is indanger of not utilizing the available bandwidth

TCP Vegas Modified Slow-Start Mechanism TCP Vegas Modified Slow-Start Mechanism

- Slow-Start in TCP Reno :

TCP is a “self-clocking” protocol

It uses ACKs as a “clock” to strobe new segments into the network

At the beginning of a connection or after aretransmit timeout, Slow-Start is used togradually increase the amount of data in transit(the size of congestion window -- cwnd)

The Slow-Start period ends when the exponentially increasing congestion window reaches the threshold window -- ssthresh

Once a retransmit timeout occurs, ssthresh is setto one half of the current cwnd

However, when the connection starts, there isno idea how much ssthresh should be initialized to

Too small initial ssthresh : throughput suffersToo large initial ssthresh : losses occur

For Reno, the initial ssthresh is set to a very highvalue. TCP sender is blindly in the slow-start phaseuntil a retransmit timeout occurs (timeout meanssegment losses)

At that time, TCP sender has some idea aboutthe available bandwidth of the connection

- Modified Slow-Start in TCP Vegas :

Find a connection’s available bandwidth withoutallowing losses during the initial slow-start

Every other RTT, exponential growth is allowed

In between, the congestion window stays fixedand the comparison of the expected and actualrates is made

When “Expected - Actual == 1”, Vegas switch fromSlow-Start to linear/decrease mode

Incorporate the congestion detection mechanisminto slow-start

SENDER RECEIVER

#0(1)

(2)

Vegas : modified slow-start mechanism (diagram) Vegas : modified slow-start mechanism (diagram)

Comparison is made

#1#2

#3#4

(4)

Exponential growth

Exponential growth#5#6#7#8

Comparison is made

. . .