ns-2 lab: tcp/ip congestion control

1

NS-2 LAB: TCP/IP Congestion control

2005-2006 - Semester A

Miriam Allalouf

3

Networks Design Course Datagram Networks

Network Layer Service ModelThe transport layer relies on the services of the network layer : communication services between hosts. Router’s role is to “switch” packets between input port to the output port.Which services can be expected from the Network Layer?Is it reliable? Can the transport layer count on the network layer to deliver the packets to the destination?When multiple packets are sent, will they be delivered to the transport layer in the receiving host in the same order they were sent?Amount time between two sends equal to the amount time between two receives?Will be any congestion indication of the network?What is the abstract view of the channel connecting the transport layer in the sending and the receiving end points?

4


Best-effort service is the only service model provided by Internet.No guarantee for preserving timing between packets No guarantee of order keepingNo guarantee of transmitted packets eventual delivery

Network that delivered no packets to the destination would satisfy the definition of best-effort delivery service.

Is Best-effort service equivalent to no service at all?

5


No service at all? Or What the advantages?Minimal requirements on the network layer easier to interconnect networks with very different link-layer technologies: satellite, Ethernet, fiber, or radio use different transmission rates and loss characteristics. The Internet grew out of the need to connect computers together. Sophisticated end-system devices implement any additional functionality and new application-level network services at a higher layer. applications such as e-mail, the Web, and even a network-layer-centric service such as the DNS are implemented in hosts (servers) at the edge of the network. Simple new services addition: by attaching a host to the network and defining a new higher-layer protocol (such as HTTP) Fast adoption of WWW to the Internet.

6

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

full duplex data: bi-directional data flow in

same connection MSS: maximum segment

size connection-oriented:

handshaking (exchange of control msgs) init’s sender, receiver state before data exchange

flow controlled: sender will not overwhelm

receiver

point-to-point: one sender, one receiver

reliable, in-order byte stream: no “message boundaries”

pipelined: TCP congestion and flow

control set window size

7

TCP segment structure

source port # dest port #32 bits

applicationdata

(variable length)

sequence numberacknowledgement

numberrcvr window sizeptr urgent datachecksum

FSRPAUheadlen

notused

Options (variable length)

URG: urgent data (generally not used)

ACK: ACK #valid

PSH: push data now(generally not used)

RST, SYN, FIN:connection estab(setup, teardown

commands)

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

8

TCP seq. #’s and ACKsSeq. #’s:

byte stream “number” of first byte in segment’s data

ACKs: seq # of next byte

expected from other side

cumulative ACKQ: how receiver handles out-

of-order segments A: TCP spec doesn’t

say, - up to implementor

Host A Host B

Seq=42, ACK=79, data = ‘C’

Seq=79, ACK=43, data = ‘C’

Seq=43, ACK=80

Usertypes

‘C’

host ACKsreceipt

of echoed‘C’

host ACKsreceipt of

‘C’, echoesback ‘C’

timesimple telnet scenario

9

At TCP Receiver: TCP ACK generation [RFC 1122, RFC 2581]

Event

in-order segment arrival, no gaps,everything else already ACKed

in-order segment arrival, no gaps,one delayed ACK pending

out-of-order segment arrivalhigher-than-expect seq. #gap detected

arrival of segment that partially or completely fills gap

TCP Receiver action

delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK

immediately send singlecumulative ACK

send duplicate ACK, indicating seq. #of next expected byte

immediate ACK if segment startsat lower end of gap

10

TCP: retransmission scenarios

Host A

Seq=92, 8 bytes data

ACK=100

losstimeo

ut

time lost ACK scenario

Host B

X


ACK=100

Host A


ACK=100

Seq=

92 ti

meo

uttime premature timeout,

cumulative ACKs

Host B


ACK=120


Seq=

100

timeo

ut

ACK=120

11

TCP Round Trip Time and TimeoutQ: how to set TCP

timeout value? longer than RTT

note: RTT will vary too short: premature

timeout unnecessary

retransmissions too long: slow

reaction to segment loss

Q: how to estimate RTT? SampleRTT: measured time

from segment transmission until ACK receipt ignore retransmissions,

cumulatively ACKed segments

SampleRTT will vary, want estimated RTT “smoother” use several recent

measurements, not just current SampleRTT

12

TCP Round Trip Time and TimeoutEstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially fast typical value of x: 0.125 (1/8)

Setting the timeout EstimtedRTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin

Timeout = EstimatedRTT + 4*Deviation

Deviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT|

13

Principles of Congestion Control

Congestion: informally: “too many sources sending too

much data too fast for network to handle” different from flow control! manifestations:

lost packets (buffer overflow at routers) long delays (queueing in router buffers)

a top-10 problem!

14

Causes/costs of congestion: scenario 1

two senders, two receivers

one router, infinite buffers

no retransmission

large delays when congested

maximum achievable throughput

15


one router, finite buffers sender retransmission of lost packet

16

Causes/costs of congestion: scenario 2 always: (goodput) “perfect” retransmission only when loss: retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

in

out=

in

out>

inout

“costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple copies of pkt

17

Causes/costs of congestion: scenario 3 four senders multihop paths timeout/retransmit

in

Q: what happens as and increase ?

in

18


Another “cost” of congestion: when packet dropped, any “upstream transmission capacity

used for that packet was wasted!

19

Approaches towards congestion control

End-end congestion control:

no explicit feedback from network

congestion inferred from end-system observed loss, delay

approach taken by TCP

Network-assisted congestion control:

routers provide feedback to end systems single bit indicating

congestion (SNA, DECbit, TCP/IP ECN, ATM)

explicit rate sender should send at

Two broad approaches towards congestion control:

20

TCP Congestion Control end-end control (no network assistance) transmission rate limited by congestion window size, Congwin, over

segments:

w segments, each with MSS bytes sent in one RTT:

throughput = w * MSS

RTT Bytes/sec

Congwin

21

TCP Self-clocking Adjust Congestion window sending rate

adjustment Slow Acks (low link or high delay) Slow cwnd

Increase Fast Acks Fast cwnd Increase

22

TCP congestion control: two “phases”

slow start congestion avoidance

important variables: Congwin based on the

perceived level of congestion. The Host receives indications of internal congestion

threshold: defines threshold between two slow start phase, congestion control phase

“probing” for usable bandwidth: ideally: transmit as

fast as possible (Congwin as large as possible) without loss

increase Congwin until loss (congestion)

loss: decrease Congwin, then begin probing (increasing) again

23

TCP Slowstart

exponential increase (per RTT) in window size (not so slow!)

loss event: timeout (Tahoe TCP) and/or or three duplicate ACKs (Reno TCP)

initialize: Congwin = 1for (each segment ACKed) Congwin++until (loss event OR CongWin > threshold)

Slowstart algorithmHost A

one segment

RTT

Host B

time

two segments

four segments

24

TCP Congestion Avoidance

/* slowstart is over */ /* Congwin > threshold */Until (loss event) { every w segments ACKed: Congwin++ }threshold = Congwin/2Congwin = 1perform slowstart

Congestion avoidance

1

1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs

Tahoe

Reno

25

Additive Increase Additive Increase is a

reaction to perceived available capacity.

Linear Increase basic idea: For each “cwnd’s worth” of packets sent, increase cwnd by 1 packet.

In practice, cwnd is incremented fractionally for each arriving ACK.

TCP congestion avoidance: AIMD: additive

increase, multiplicative decrease increase

window by 1 per RTT

decrease window by factor of 2 on loss event

AIMD

increment = MSS x (MSS /cwnd)cwnd = cwnd + increment

27

Fast Retransmit Coarse timeouts remained a problem, and Fast

retransmit was added with TCP Tahoe. Since the receiver responds every time a packet arrives,

this implies the sender will see duplicate ACKs.Basic Idea:: use duplicate ACKs to signal lost packet.

Fast RetransmitUpon receipt of three duplicate ACKs, the TCP Sender

retransmits the lost packet.

28

Fast Retransmit Generally, fast retransmit eliminates about half

the coarse-grain timeouts. This yields roughly a 20% improvement in

throughput. Note – fast retransmit does not eliminate all the

timeouts due to small window sizes at the source.

29

Fast Recovery Fast recovery was added with TCP Reno. Basic idea:: When fast retransmit detects

three duplicate ACKs, start the recovery process from congestion avoidance region and use ACKs in the pipe to pace the sending of packets.

Fast RecoveryAfter Fast Retransmit, half cwnd and commence

recovery from this point using linear additive increase‘primed’ by left over ACKs in pipe.

30

TCP FairnessFairness goal: if N TCP

sessions share same bottleneck link, each should get 1/N of link capacity

TCP connection 1

bottleneckrouter

capacity RTCP connection 2

31

Why is TCP fair?Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConn

e ctio

n 2

thro

u ghp

ut

congestion avoidance: additive increaseloss: decrease window by factor of 2

congestion avoidance: additive increaseloss: decrease window by factor of 2

32

TCP strains

Tahoe

Reno

Vegas

33

Vegas

34

From Reno to VegasReno Vegas

RTT measurem-t

coarse fine

Fast Retrans 3 dup ack Dup ack + timer

CongWin decrease

For each Fast Retrans.

For the 1st Fast Retrans. in RTT

Congestion detection

Loss detection Also based on measured throughput

35

Some more examples

36

TCP latency modelingQ: How long does it take

to receive an object from a Web server after sending a request?

TCP connection establishment

data transfer delay

Notation, assumptions: Assume one link between

client and server of rate R

Assume: fixed congestion window, W segments

S: MSS (bits) O: object size (bits) no retransmissions (no

loss, no corruption)Two cases to consider: WS/R > RTT + S/R: ACK for first segment in

window returns before window’s worth of data sent WS/R < RTT + S/R: wait for ACK after sending

window’s worth of data sent

37

TCP latency Modeling

Case 1: latency = 2RTT + O/R Case 2: latency = 2RTT + O/R+ (K-1)[S/R + RTT - WS/R]

K:= O/WS

38

TCP Latency Modeling: Slow Start Now suppose window grows according to slow start. Will show that the latency of one object of size O is:

RS

RSRTTP

RORTTLatency P )12(2

where P is the number of times TCP stalls at server:

}1,{min KQP

- where Q is the number of times the server would stall if the object were of infinite size.

- and K is the number of windows that cover the object.

39

TCP Latency Modeling: Slow Start (cont.)

RTT

initiate TCPconnection

requestobject

first w indow= S /R

second w indow= 2S /R

third w indow= 4S /R

fourth w indow= 8S /R

com pletetransm issionobject

delivered

tim e atc lient

tim e atserver

Example:

O/S = 15 segments

K = 4 windows

Q = 2

P = min{K-1,Q} = 2

Server stalls P=2 times.

40

TCP Latency Modeling: Slow Start (cont.)

RS

RSRTTPRTT

RO

RSRTT

RSRTT

RO

stallTimeRTTRO

P

kP

k

P

pp

)12(][2

]2[2

2latency

1

1

1

windowth after the timestall 2 1 kRSRTT

RS k

ementacknowledg receivesserver until

segment send tostartsserver whenfrom time RTTRS

window kth the transmit totime2 1

RSk

RTT

in itia te TC Pconnection

requestobject

firs t window= S /R

second w indow= 2S /R

third w indow= 4S/R

fourth w indow= 8S/R

com pletetransm issionobject

delivered

tim e atc lient

tim e atserver

41

TCP Flow Controlreceiver: explicitly

informs sender of (dynamically changing) amount of free buffer space RcvWindow field

in TCP segmentsender: keeps the

amount of transmitted, unACKed data less than most recently received RcvWindow

sender won’t overrun

receiver’s buffers bytransmitting too

much, too fast

flow control

receiver buffering

RcvBuffer = size or TCP Receive Buffer

RcvWindow = amount of spare room in Buffer

42

TCP Connection Management

Recall: TCP sender, receiver establish “connection” before exchanging data segments

initialize TCP variables: seq. #s buffers, flow control info

(e.g. RcvWindow) client: connection initiator Socket clientSocket = new

Socket("hostname","port number"); server: contacted by client Socket connectionSocket =

welcomeSocket.accept();

Three way handshake:Step 1: client sends TCP SYN

control segment to server specifies initial seq #

Step 2: server receives SYN, replies with SYNACK control segment ACKs received SYN allocates buffers specifies server-to-receiver

initial seq. #Step 3: client sends ACK and data.

43

TCP Connection Management (cont.)

Closing a connection:

client closes socket: clientSocket.close();

Step 1: client end system sends TCP FIN control segment to server.

Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN.

client

FIN

server

ACK

ACK

FIN

close

close

closed

timed

wai

t

44

TCP Connection Management (cont.)

Step 3: client receives FIN, replies with ACK.

Enters “timed wait” - will respond with ACK to received FINs

Step 4: server, receives ACK. Connection closed.

Note: with small modification, can handly simultaneous FINs.

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

timed

wai

tclosed

45

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

46

Summary principles behind transport

layer services: multiplexing/demultiplexing reliable data transfer flow control congestion control

instantiation and implementation in the Internet UDP TCP

48

Q mng : Packet Dropping : Tail Drop

Tail Drop – packets are dropped when the queue is full

causes the Global Synch. problem with TCP

Queue Utilization100%

TimeTail Drop

49

Packet Dropping : RED

Proposed by Sally Floyd and Van Jacobson in the early 1990s packets are dropped randomly prior to periods of high

congestion, which signals the packet source to decrease the transmission rate

distributes losses over time

50

RED - Implementation Drop probability is based on min_threshold, max_threshold, and mark

probability denominator.

When the average queue depth is above the minimum threshold, RED starts dropping packets. The rate of packet drop increases linearly as the average queue size increases until the average queue size reaches the maximum threshold.

When the average queue size is above the maximum threshold, all packets are dropped.

51

RED (cont.)

Buffer occupancy calculation Average Queue Size – Weighted Exponential Moving Average

0

1

min maxav. queue size

drop prob. …

Max Prob

52

GRED Gentle RED

Buffer occupancy calculation Average Queue Size – Weighted Exponential Moving Average

0

1

min maxav. queue size

drop prob. …

Max Prob

2*max

ns-2 lab: tcp/ip congestion control

Documents

higher layer

network layer easier

transport layer count

new higherlayer protocol

centric service

effort delivery service

effort service equivalent

urgent data