modeling tcp throughput: a simple model and its empirical validation

Modeling TCP Throughput: A Simple Model and its Empirical ValidationJitendra Padhye, Victor Firoiu, Don Towsley, and Jim Kurose

SIGCOMM 1998

ContributionsDevelop a simple analytic characterization of the steady state throughput of a bulk transfer TCP flow (i.e., a flow with an unlimited amount of data to send) as a function of loss rate and round trip timeThe model captures both the behavior of TCPs fast retransmit mechanism and the effect of TCPs timeout mechanism on throughputThe model can accurately predict throughput over a significantly wider range of loss rates than the previous worksExplicitly models the effects of small receiver-side windows

A model for TCP Congestion ControlFocus on the congestion avoidance behavior of TCP and its impact on throughput, taking into account the dependence of congestion avoidance on ACK behavior, the manner in which packet loss is inferred (dup ACK detection and fast retransmit or by timeout), limited receiver window size, and average round trip time (RTT)The model is based on TCP-RenoRecall: in TCPs congestion avoidance,the congestion window, W, is increased by 1/W each time a regular ACK is receivedconversely, the window is decreased whenever a lost packet is detected:if the loss is detected by the triple dup ACKs, cwnd = cwnd / 2if the loss is detected by timeout, cwnd = 1 (slow-start)

Details of the modelTCPs congestion avoidance behavior is modeled in terms of rounds:a round starts with the back-to-back transmission of W packets, where W is the current size of the TCP congestion windowonce all packets falling within the congestion window have been sent in this back-to-back manner, no other packets are sent until the first ACK is received for one of these W packetsthis ACK reception marks the end of the current round and the beginning of the next roundnote that, in this model, the duration of a round is equal to the round trip time and is assumed to be independent of the window size

At the beginning of the next round, a group of W new packets will be sent, where W is the new size of the congestion windowLet b be the number of packets that are acknowledged by a received ACK. (i.e., delayed ACK, b=2)if W packets are sent in the first round and are all received and acknowledged correctly, then W/b acks will be receivedsince each ack increases the window size by 1/W, the window size at the beginning of the second round is then W = W + 1/bthat is, during congestion avoidance and in the absence of loss, the window size increases linearly in time, with a slope of 1/b packets per round trip time

assumptions:The duration of a round is assumed to be independent of the window sizeThe time needed to send all the packets in a window is smaller than the round trip timeA packet is lost in a round independently of any packets lost in other roundsOn the other hand, if a packet is lost, all remaining packets transmitted until the end of that round are also lost (bursty loss behavior) - tail dropthroughput is measured in terms of packets per unit of time

Loss indications are exclusively triple-duplicate ACKsloss indications are exclusively of type triple-duplicate ACK (TD)the window size is not limited by the receivers advertised windowthe flow starts at time t = 0, and the sender always has data to sendfor any given time t > 0,let Nt be the number of packets transmitted in the interval [0, t], andlet Bt = Nt / t be the throughput of that intervalthe long-term steady-state TCP throughput B is defined as:

let p be the probability that a packet is lost, given that either it is the first packet in its round or the preceding packet in its round is not lostnote that, we are interested in establishing a relationship B(p)

a TD period (TDP) is a period between two TD loss indicationsbetween two TD loss indications, the sender is in congestion avoidance and the window increases with slope 1/b packets per roundfor the i-th TD period,Yi : # packets sent in the periodAi : the duration of the periodWi: the window size at the end of the periodconsidering {Wi}i to be a Markov regenerative process with rewards {Yi}i, it can be shown that:

to derive B, the long-term steady state throughput, we must derive E[Y] and E[A]

a TD period starts immediately after a TD loss indication, and thus the current congestion window size is equal to Wi-1 / 2, half the size of window before the TD occurredat the end of each round the window is incremented by 1/b and # of packets sent per round is incremented by one every b rounds i : the first packet lost in TDPi Xi : the round which this loss occursafter packet i, (Wi) - 1 more packets are sent in an additional round before a TD loss indication occurs (and the current TD period ends)thus, a total of Yi = i + (Wi) - 1 packets are sent in (Xi) + 1 roundsit follows that:

(

(

(

to derive E[ ], consider the random process { i}i, where i is the number of packets sent in a TD period up to and including the first packet that is lostbased on the assumption that packets are lost in a round independently of any packets lost in other rounds, { i}i is a sequence of independent and identically distributed (i.i.d.) random variablesgiven the proposed loss model, the probability that i= k is equal to the probability that exactly k-1 packets are successfully acknowledged before a loss occurs

now, we have to derive E[W] and E[A]

(

(

(

(

(

to derive E[W] and E[A], consider again a TDPiDefine rij to be the duration (round-trip time) of the j-th round of TDPiconsider the round-trip times rij to be random variables, that are assumed to be independent of the size of congestion window, and thus independent of the round number, j then, the duration of TDPi is if follows from the assumption mentioned above that:

the paper denoted that E[r] = RTT, the average value of round-trip time now, we have to derive an expression for E[X]

to derive and expression for E[X], consider the evolution of Wi as a function of the number of rounds, as in figure 2for simplicity, in this derivation, it is assumed that (Wi-1 / 2) and (Xi / b) are integersfirst of all, it can be expressed that during the i-th TD period, the window size increases between Wi-1 / 2 and Wi. Since the increase is linear with slope 1 / b, we have:

next, the fact that Yi packets are transmitted in TDPi is expressed by:Where i : the number of packets send in the last round (Xi+1-th)

(

Assume that {Xi} and {Wi} are mutually independent sequences of random variables, it follows from (7) that

it also follows from (10) and (5) that

we consider that i, the number of packets in the last round is uniformly distributed between 1 and Wi, and thus E[ ] = E[W] / 2from (11) and (12), we have:

observe that,i.e., for small values of p

(

(

from (11) and (13), we have

from (6) and (15), we have

observe that,

from (1) and (5), we havesubstitute (13) and (16) in (18), we getequation (19) can be expressed as:

Loss indications are triple-duplicate ACKs and time-outsfrom the measurements done by this paper, the majority of window decreases are due to time-outs rather than fast retransmitshence, a good model should capture time-out loss indicationsto capture time-out loss indications, the model has to be extended to include the case where the TCP sender times-outthis occurs when packets (ACKs) are lost, and less than three duplicate ACKs are received

the sender waits for a period of time denoted by T0, and then retransmits non-acknowledged packetsfollowing a time-out, the congestion window is reduced to one, and one packet is thus resent in the first round after a time out.in the case that another time-out occurs before successfully retransmitting the packets lost during the first time-out, the period of time out doubles to 2*T0; this doubling is repeated for each unsuccessful retransmission until 64*T0 is reached, after which the time out period remains constant at 64*T0

ZiTO denotes the duration of a sequence of time-outs ( no successful retransmission in those periods)ZiTD denotes the time interval between two consecutive time-out sequences (there is some successful retransmission and a number of TD periods within the interval)define Si to be: Si = ZiTD + ZiTOdefine Mi to be # packets sent during Sidefine, also, Ri to be # packets sent during time-out sequence ZiTO

given {(Si, Mi)}i is an i.i.d. sequence of random variables, we have:

let ni be the number of TD periods in interval ZiTDfor the j-th TD period of interval,define Yij to be the number of packets sent in the perioddefine Aij to be the duration of the perioddefine Xij to be the number rounds in the perioddefine Wij to be the window size at the end of the period note that, the definition of a TD period is extended to the period: between two TD loss indications (original definition), orstarting after a TO loss indication and ended by a TD loss indicationstarting after a TD loss indication and ended by a TO loss indication

hence, we have:

assume that {ni}i to be an i.i.d. sequence of random variables, independent of {Yij} and {Aij}, we have :

to derive E[n]:observe that, during ZiTD, the time between two consecutive time-out sequences, there are ni TDPs, where each of the first ni-1 end in a TD, and the last TDP ends in a TOaccording to the observation mentioned above, it follows that:in ZiTD, there is one TO out of ni loss indicationstherefore, if we denote by Q the probability that a loss indication ending a TDP is a TO, we have E[n] = 1 / Qnote that {ni}i is considered as Geom(Q)consequently,

since Yij and Aij do not depend on time-outs, their means are those derived in (4) and (16)to compute TCP throughput using (21), we must still determine Q, E[R] and E[ZTO]

to derive an expression for Q, consider the round where a loss occur in figure 4; it will be referred to as the penultimate round note that, in figure 4, the ACK is not delayed (b = 1) for simplicity of illustrationlet w be the current congestion window sizethus, packets f1fw are sent in the penultimate roundpackets f1fk are acknowledgedpacket fk+1 is the first one to be lost (or not ACKed)again, from the assumption that packet losses are correlated within a round, all packet following fk+1 in the penultimate round are also losthowever, since packets f1fk are ACKed, another k packets, s1sk are sent in the next round, which will be referred as the last roundthis last round contains another loss, say packet sm+1again, based on the assumption about packet loss correlation, sm+2sk are also lost in the last round

the m packets successfully sent in the last round are responded to by ACKs for packet fk, which are counted as duplicate ACKssince ACKs are not delayed in this scenario, the number of duplicate ACKs is equal to the number of successfully received packets in the last roundif the number of such ACKs is greater than 3, then a TD indication occursotherwise, a TO occursin both cases, the current period between losses, TDP, endsdefine A(w,k) to be the probability that the first k packets are ACKed in a round of w packets, given there is a sequence of one or more losses in the round.

also, define C(n, m) to be the probability that m packets are ACKed in sequence in the last round (where n packets were sent; p is the probability that a packet will be lost) and the rest of the packets in the round, if any, are lost

then, , the probability that a loss in a window of size w is a TO, is given by

Note that,

: # of packets successfully transmitted in the penultimate round, k, is less than three

: # of packets successfully transmitted in the penultimate round, k, is greater than three; however # of packets successfully transmitted in the last round, m, is less than threeafter algebraic manipulations, we have

numerically, a very good approximation of Q is

(

Q, the probability that a loss indication is a TO, is

next, we consider the derivation of E[R]from the observation in TCP traces, in most cases, one packet is transmitted between two time-outs in sequencein addition, a sequence of k TOs occurs when there are k-1 consecutive losses (the first loss is given) followed by a successfully transmitted packetconsequently, the number of TOs in a TO sequence has a geometric distribution, and thus

then, the expected value of R is

next, we focus on E[ZTO], the average duration of a time-out sequence excluding retransmission timesthe first six time-outs in one sequence have length (2i-1)*T0, where i = 16all immediately following timeouts having length 64*T0then, the duration of a sequence with k time-outs is

the mean of ZTO is

with the expressions for Q, E[S], E[R] and E[ZTO], the equation (21) for B(p) can be expressed as:

Q is given in (23), E[W] in (13) and E[X] in (15)using (24), (14) and (17), we have that (27) can be approximated by:

(

The impact of window limitationduring a period without loss indications, the senders window is dominated by both the congestion avoidance algorithm and the receivers advertised windowsenders window = min(cwnd, advertised window)let Wmax = min(cwnd, advertised window)as a consequence, during a period without loss indications, the window size can grow up to Wmax, but will not grow further beyond this value

define Wu to be the unconstrained window size, the mean of which is given in (13):

if E[Wu] < Wmax, in other words, if E[Wu] < Wmax, the receiver-window limitation has negligible effect on the long term average of the TCP throughput, and thus the TCP throughput is given by (27)

if Wmax

during the 1st TDP, the window grows linearly up to Wmax for U1 rounds, then remains constant for V1 roundsthen a TD indication occurs, the window drops to Wmax / 2, and the process repeatsthus, Wi= (Wi-1 / 2) + (Ui / b)E[W] = (E[W] / 2) + (Ui / b)

Wmax / 2= E[U] / bE[U] = (b / 2) Wmax

considering the number of packets sent in the i-th TD period, we have:

and then

since Yi, the number of packets in the i-th TD period, does not depend on window limitation, E[Y] has been given by (5), E[Y] = (1 - p) / p + Wmax, and thus

finally, since Xi = Ui + Vi, we have

by substituting this result of E[X] and E[W] Wmax in (27), we obtain the TCP throughput, B(p), when the window is limited

in conclusion, the complete characterization of TCP throughput, B(p), is:

(

where f(p) is given in (28), Q is given in (23) and E[Wu] is in (13)

the following approximation of B(p) follows from (29) and (31):

equation (31) will be referred as the full model

equation (32) will be referred as the approximate model

(

Measurements and Trace Analysisequations (31) and (32) provide an analytic characterization of TCP as a function of packet loss indication rate, RTT and maximum window sizenext, the empirical validation of these formulae will be donethe measurement data are collected from 37 TCP connections established between 18 hosts scattered across United States and Europe

table 1 lists the domains and operating systems of 18 hostsall data sets are for unidirectional bulk data transferthe measurement data are gathered by running tcpdump at the sender, and analyzing its outputvarious measurement and implementation related problems:E.g., Linux sender uses two dup ACKs to indicate loss instead of threethe trace analysis programs were further verified by checking them against tcptrace and ns

Table 2:24 data sets, each corresponds to a 1 hour long TCP connectionsender behaves as an infinite source p(total # loss) / (total # packet sent)the 5th and 6th columns show a breakdown of the loss indications to: TD and TOthe last two columns report the average round-trip time and average duration of a single timeout (T0)these values have been averaged over the entire traceperformed at randomly selected timesduring 1997 and beginning of 1998

Table 3 reports summary results from additional 13 data sets:each data set represents 100 serially-initiated TCP connections between a given sender-receiver paireach connection lasted 100 seconds, and was followed by a 50 second gap before the next connection was initiatedthese experiments were performed at randomly selected times during 1998

important observations drawn from the data in these tables:in all traces, timeouts constitute the majority or a significant fraction of the total number of loss indicationsexponential backoff due to multiple timeouts occurs with significant frequencynext, use the measurement data described above to validate the model proposed in the papereach one-hour trace was divided into 36 consecutive 100 second intervals, and each plotted point on a graph represents the number of packet sent versus the number of loss indications during a 100s intervalthe x-axis represents the frequency of loss indications, pthe y-axis represents the number of packets sent

* each 100 second interval is classified into one of five categories:- TD: did not suffer any timeout (only triple-duplicate)- T0: suffered at least one single timeout (no exp backoff)- T1: suffered from a single exp backoff (double timeout)- T2: suffered from two exp backoffs - T3 or more: more than two exp backoffs occurred

Models for 1 hour traces

Nobserved-number of packets sent over an intervalPobserved-loss frequency over an intervalNpredicted = B(pobserved)*100s

Models for 100 second tracesUse the value of RTT and timeout calculated for each 100 second trace.For most cases, proposed model is better than the TD Only model.

Model and Experimental ResultsAssumption that round trip time is independent of the window size is found to be false for certain cases:Assumption is checked by measuring coefficient of correlation between duration of round samples and the number of packets in transit during each sample.For most traces the coefficient is in the range -0.1 to +0.1In the case when the receiver is at the end of a modem line, the coefficient of correlation is found to be as high as 0.97

Note that, the paper does not model fast recovery mechanism.However, it is believed that (though the model does not cover fast recovery) the model captured the essential elements of TCP behavior, as indicated by the generally very good fits between model predictions and measurements made on numerous commercial TCP implementationsA Markov regenerative process is a stochastic process having the property that there exists time points at which the process restarts itself.Reward means things that you earn during the time that the Markov regenerative process is in a state i.In our case, from figure 1, we can see that the path of the evolution of congestion window size is a Markov regenerative process since it restarts itself everytime the TD loss indication occurs. And the reward is the packets which can be sent while the sender is in a TDPi (Yi).For a renewal reward process which starts over again at the cycle time T1,average reward per unit time = E[rewards by time T1] E[T1] * TCPdump is first written by Van Jacobson to be used as a tool in monitoring TCP performance. The simplest way to use TCPdump is to run it with just an `-i' switch to specify which network interface should be used. This will dump summary information for every Internet packet received or transmitted on the interface. TCPdump must be able to put the interface (typically an Ethernet) into promiscuous mode to read all the network traffic. Currently supported systems include SunOS, Ultrix, and most BSDs. Linux is not supported, though there have been reports of a port. * Tcptrace is a tool written by Shawn Ostermann at Ohio University, for analysis of TCP dump files. It can take as input the files produced by several popular packet-capture programs, including tcpdump, snoop, etherpeek, HP Net Metrix, and WinDump. tcptrace can produce several different types of output containing information on each connection seen, such as elapsed time, bytes and segments sent and received, retransmissions, round trip times, window advertisements, throughput, and more. It can also produce a number of graphs for further analysis.

modeling tcp throughput: a simple model and its empirical validation

Documents

current round

b packets

round trip timeassumptions

window size increases

ack behavior

round trip timethe model

terms of packets

group of w new packets