traﬃc concept, measurements, statisticstraﬃc modeling d.moltchanov, tut, 2005 n(t), number of...

Traffic concept, measurements, statistics

Lecturer: Dmitri A. Moltchanov

E-mail: [email protected]

http://www.cs.tut.fi/˜moltchan/modsim/

http://www.cs.tut.fi/kurssit/TLT-2706/

Traffic modeling D.Moltchanov, TUT, 2005

OUTLINE:

• Traffic concept;

• Traffic measurements;

• Step-by-step traffic modeling procedure;

• Points of interest in traffic modeling;

• Observations from Internet traffic measurements;

• What statistics to capture;

• Estimation of the statistics;

• Choosing a candidate model;

• Fitting parameters of the model;

• Testing for accuracy of approximation;

• Example of the traffic modeling procedure.

Lecture: Traffic concept, measurements, models 2


1. Importance of the trafficThe cost of any telecommunication system depends on the amount of traffic.

The main aims when planning a telecommunication system is to:

• adjust the amount of equipment so that the required quality if satisfied;

• use the equipment as efficient as possible;

• keep costs as small as possible.

Teletraffic theory deals with developing methods suitable for:

• optimization of the structure of the network to satisfy a given traffic;

• adjustment of the amount of equipment to satisfy a given traffic.

Since these both tasks depends upon the amount of traffic we have to define:

• what is the traffic?

• what is the unit of traffic?

We distinguish between circuit-switched and packet switched networks.



2. Traffic in circuit-switched networksDefinition: traffic intensity in a pool of resources at t is the number of busy resources at t.

Mean traffic intensity is given by:

Y (T ) =1

T

∫ T

0

n(t)dt, (1)

• where n(t) denotes the number of occupied resources (servers) at the time t.

Note the following:

• a pool of resources may be any group of certain servers (i.e. number of trunks);

• statistical moments of the traffic intensity may be calculated for a given period of time T ;

• traffic intensity is usually referred to as average traffic intensity.

Definition: carried traffic AC is the traffic carried by the group of servers during interval T .



n(t), number of busy trunks

t, time

instantaneous traffic intensity

average traffic intensity

Figure 1: Illustration of the average traffic intensity.

The following is important:

• the carried traffic cannot exceed the number of trunks;

• a single trunk can at most can carry one Erlang of the traffic!

The total traffic carried in a period of time T is called a traffic volume (Erlang-hours).



Note the following:

• carried traffic AC is often proportional to income of a network operator;

• losses are usually due to inability to carry all traffic!

Definition: offered traffic A:

• traffic which would be carried if no arrivals were rejected due to a lack of capacity;

• this concept is usually used in theoretical studies:

How to compute offered traffic:

A = λ × s. (2)

• λ: mean number of arrivals per a time unit;

• s: mean service time of arrival.



So far we defined two different types of traffic. These are:

• offered traffic A;

• carried traffic AC .

• volume of these traffics (A, AC) may not be equal to each other.

Lost (rejected, blocked) traffic: difference between offered traffic and carried traffic:

AL = A − AC . (3)

• the value of lost traffic is reduced by increasing the capacity of the system;

• when the capacity of the system tends to infinity AC → A.

Example: arrival intensity is 10 arrs/m.; mean service time is 2 minutes:

A = 10 · 2 = 20(Erlangs). (4)



2.1. Traffic variations

Traffic in circuit-switched networks varies according to activity of subscribers:

• traffic is generated by single sources – subscribers;

• subscribers are assumed to be independent.

Measurements have shown that traffic is characterized by two major components:

• stochastic component:

– random generation of calls by subscribers.

• deterministic component:

– nearly deterministic variability of number of calls over days, weeks, months and even years

– cause: subscribers’ needs to make more calls in a certain period of time.

Variations in traffic can be divided into:

• variations in service times;

• variations in number of calls.



Figure 2: Average number of voice calls: 10 workdays averages, taken from V.B. Iversen.



Figure 3: Average service times for voice calls: taken from V.B. Iversen.



Peaks in average number of calls and service time usually depends on:

• whether the exchange is located in residential, rural, or business area;

• on what traffic we look at.

Deterministic nature: traffic patterns looks very similar for a different days:

• traffic patterns are similar during week-days;

• traffic patterns are similar during week-end days;

• traffic patterns are different during week-days and week-end days.

Natural question:

• when the peak number of calls occurs?

• is this peak the same for each day?



Generally, deterministic variations in the traffic can be divided to:

• 24 hours variations:

– as those we considered previously.

• weekly variations:

– highest traffic: Monday, then on Friday, Tuesday, Wednesday, Thursday, Saturday, Sunday.

• year variations:

– for example: there is a very low traffic in vacation times (July in Finland).

• large scale variation:

– traffic increases depending on technology development and economic state of the society.


• we considered a traditional PSTN traffic;

• other traffic types or other circuit-switched networks have their own patterns and variations.

– dial-up Internet via modems;

– voice calls in GSM/IS-95/UMTS mobile networks.



Figure 4: Average number of modem calls: single day, taken from V.B. Iversen, year 1999.



Figure 5: Average service times for modem calls: taken from V.B. Iversen.



2.2. The concept of busy hour

Time consistent busy hour (TCBH):

• time period of 60 minutes during which, measured on a long time, the highest traffic occurs.

Note the following:

• the highest traffic may not occur during the same time every day!

time0 4 8 12 16 20 24

calls/minute

time0 4 8 12 16 20 24

calls/minute



2.3. Blocking concept

Circuit-switched telecommunications systems:

• are dimensioned so that subscribers are sharing the expensive equipment:

– never dimensioned so that all subscribers can connect at the same time;

– equipment which is separate for each subscriber should be made as cheap as possible.

• there is a concentration from the subscribers towards exchange....

custo

mers

Exchange

...

to domestic concentrator

...to international concentrator

Figure 6: Sketch of the telephone exchange.



What are usual dimensioning rules applied:

• about 5 − 8% of subscribers should be able to make domestic calls at the same time;

– note that each phone is usually used 10 − 16% of the time.

• about 1% of subscribers should be able to make international calls at the same time.

How it is made and what it gives:

• statistical multiplexing at the call level;

• subscriber feels that he has an unrestricted access to all resources.

There should be some problems:

• resources are shared with many others;

• it is possible that a subscriber cannot establish a call.



When it is not possible to to establish a call it:

• has to wait;

• has to be blocked.

Depending on how system operates we distinguish between:

• loss systems: arrival is lost when there are insufficient resources in the system;

• waiting systems: arrival waits when there are insufficient resources in the system;

• mixed loss-waiting systems: depending on arrival it can wait of get lost.

Networks performance measures in loss systems can be expressed using:

• call congestion B: fraction of call attempts that observes all servers busy;

• time congestion E: fraction of time when all servers are busy;

• traffic congestion C: the fraction of the offered traffic that is not carried.



3. Traffic in packet-switched networksIn data networks we talk about transmission needs:

• any packet can be of s units in length (bits or bytes);

• any link is characterized by a capacity φ (units per second).

Then the service time for a customer (so-called transmission time) is:

s

φ. (5)

Utilization ρ of the link is:

ρ =λs

φ, 0 < ρ < 1. (6)

• λ is arrival rate of packets per time unit.



4. Traffic measurementsTo provide quantitative analysis of telecommunication system we have to:

• provide adequate traffic model:

– determine important statistical parameters of input traffic:

∗ measure traffic patterns;

∗ compute statistical parameters of the patterns.

– match these parameters using appropriate traffic model.

• provide model of the service process;

• carry out analysis of the system under different conditions.

Any traffic measurement is characterized by the following three parameters:

• period of measurement;

• type of measurement;

• parameters under consideration.



4.1. Why do we need traffic measurements?

There are four main reasons why network traffic measurements are very useful:

• network troubleshooting:

– malfunctioning equipment may disrupt the operation of the network;

– examples: broadcast storm, illegal packet sizes, incorrect addresses, security attacks;

– measurements: may help to locate this equipment.

• protocol/application debugging:

– developers want to test a new, improved version of protocol/application;

– measurements: may reveal ’hidden problems’ of the protocol/applications.

• traffic characterization:

– what is the current workload, what are the future trends?;

– measurements: are required to answer these questions.

• performance evaluation:

– what is the performance of the router, application?

– measurements: are required to characterize the workload.



4.2. Methods of measurements

Traffic measurements can be implemented using the following operations:

• observing number of events:

– collecting a number of events over a constant time intervals;

– it corresponds to number representation of arrival process;

• observing time intervals:

– collecting data about the lengths of time intervals between events;

– this corresponds to an interval representation of arrival process.

Using these operations we may obtain any characteristic of a traffic process to:

• apply these characteristics to develop a traffic model;

• directly to dimension a system under consideration.

Traffic measuring methods can be divided into two major categories:

• continuous measuring methods: measuring equipment is activated at the instant of the event;

• discrete measuring methods.



4.3. Discrete measurements

Discrete measurement (so-called scanning method):

• time points are chosen;

• measuring equipment tests whether there have been changes at the measuring time points;

• time points are usually equally separated;

• events between two time points are considered as happened together.

time

time

points of measutements

1 1 2 1 2 1 1

Figure 7: Discrete measurements.



4.4. Active and passive measurements

Whether we monitor real network traffic or some kind of ’artificial’ traffic:

• active traffic measurements;

• passive traffic measurements.

Active measurements:

• packets are generated by a tool to probe the network and measure characteristics:

– ping: tool to estimate network latency to a particular destination in the Internet;

– tracert: tool to determine routing paths;

– pathchar: tool to estimate link capacities and latencies along the Internet path.

Passive measurements:

• network monitor is used to observe and record traffic on an operational network;

• most free measurement tools fall into this category:

– tcpdump;

– Etherial.



4.5. Software and hardware-based measurements tools

Depending on implementation we distinguish between:

• hardware-based traffic measurement tools;

• software-based traffic measurement tools.

Hardware-based measurement tools:

• equipment (device) with specific functionality;

• often referred to us as traffic analyzers;

• expensive: depends on the number of network interfaces, storage capacity, analysis capabilities;

• usually provide on-line statistical traffic analysis.

Software-based measurement tools:

• specific programs developed for collection and analysis of data;

• are not so expensive sometimes providing the same functionality;

• some examples: tcpdump, Etherial, ping;

• non-specific examples: web servers, proxies, firewalls, providing log files.



Figure 8: An example of the main window of Etherial with captured trace.



5. Step-by-step traffic modeling procedureStep-by-step procedure:

• determine the level of interest;

• measure traffic at the point of interest;

• decide what statistics should be captured;

• estimate statistics of traffic observations:

• choose a candidate model;

• fit parameters of the model;

• test accuracy of the model.

We will be dealing with:

• traffic in packet-switched networks;

• different levels of aggregation.



6. Level of interest for traffic modelingTraffic can be represented:

• at the session level:

– request for downloading files from ftp server;

– request for downloading pages from www server.

• at the packet level.

Which level to choose:

• depends on particular task;

General notes:

• session level:

– usually claimed for follow Poisson process;

– reality might be quite different!

• packet level: any behavior should be expected.



7. Points of interest in traffic modelingYou have to take into account:

• where you are asked to model the traffic (evaluate performance)?

customer side network side

2

3

1

Figure 9: Points at which traffic is usually measured and modeled.



7.1. Point 1: particular application:

We distinguish between:

• voice application;

• video application;

• data transfers:

– ftp information;

– http information;

– ssh information.

What is important: properties of transport layer protocol and application:

• UDP: no specific pattern:

– does not affect much properties of application;

– you may model traffic of application only.

• TCP: very specific pattern:

– affect data transmission;

– should be taken into account.



Co

ng

estio

nw

ind

ow

,M

SS

s

time

1

4

8

16

18

TCP Tahoe

TCP Reno

Figure 10: TCP traffic pattern: TCP Reno and TCP Tahoe.

• how much traffic does the application have?

• ftp: large files; ssh, e-mail, http: large and small transfers.



7.2. Point 2: aggregated traffic from a number of applications

We distinguish between:

• heterogenous applications;

• homogenous applications.

customer side

voip

voip

voip

voip

customer side

voip

video

ftp

voip

Figure 11: Homogenous and heterogenous traffic aggregates.



7.3. Point 3: aggregated network traffic

What is that:

• aggregation of a large number of flows.

access router

access router

access router

backbone router

Figure 12: Aggregated backbone traffic.

• may have quite sophisticated properties;

• practically, cannot be obtained as superposition of individual flows.



8. Observations from the Internet traffic measurementsThe most important fact: Internet traffic changes in time.

Recent observations, trends and facts on the Internet traffic:

• TCP accounts for most of the packet traffic in the Internet;

• traffic flows are bidirectional, but often asymmetric;

• most TCP sessions are short-lived;

• the packet arrival process in the Internet is not Poisson;

• the session arrival process may be approximated by Poisson distribution;

• packet sizes are bimodally distributed;

• packet traffic is non-uniformly distributed;

• aggregate network traffic may have multi-fractal nature;

• Internet traffic continues to changes.



8.1. Domination of TCP

TCP accounts for most of the packet traffic in the Internet:

• beginning of 90th:

– it was firstly observed that TCP dominates.

• middle of 90th:

– introduction of multimedia services;

– development of RTP, RTCP, RTSP...;

– UDP share was expected to grow.

• beginning on 2000:

– TCP still dominant protocol;

– multimedia content also extensively use TCP.

Reasons of TCP dominance:

• multimedia is usually place of web pages;

• availability of TCP.



8.2. Bidirectional asymmetric traffic flows

Traffic flows are bidirectional, but often asymmetric:

• bidirectional exchange of data:

– ftp, http, ssh, e-mail, etc.

• asymmetric traffic pattern:

– ftp, http, ssh, e-mail are all request-response based protocols.

• what are the current trends:

– p2p applications may generate bidirectional asymmetric traffic;

– p2p applications: napster, kazaa, etc.



8.3. Short-lived TCP sessions

Most TCP sessions are short-lived:

• almost 90% of TCP connections exchange fewer than 10Kbytes of data in few seconds:

– WWW service: request - response;

– http v1.0: separate connection for an object on a page;

– http v1.1: single connection for a page;

– most pages and objects are less than 10Kbytes in length.

• what the effect:

– heavy-tailed distribution of session sizes.

– heavy-tail: a lot of frequencies corresponding to large histogram bins;

– reasons for heavy-tail: most of the sessions are small, some a big (ftp).



8.4. Packet arrivals are not homogenous Poisson

The packet arrival process in the Internet is not homogenous Poisson:

• common belief was:

– aggregated traffic is Poisson (or at least Markovian) in nature;

– a lot of studies have been made with Poisson assumption.

• reality:

– arrival process is not homogenous Poisson;

– interarrival times are at least correlated;

– packet arrival process may not be event covariance stationary;

– there can be so-called packet ’clumps’ or ’batches’

• result: packet arrival process is far from common assumptions.



8.5. Session arrivals are Poisson

The session arrival process may be approximated by Poisson distribution:

• what are the reasons:

– there are a lot of users getting access to a certain site;

– users can be assume the be independent;

– situation is similar to telephone network where Poisson assumption holds.



8.6. Special distribution of packet sizes

Packet sizes are bimodally distributed:

• around 50% of packets are as large as possible:

– these are TCP data packets;

– recall, it is determined by MTU of Ethernet: 1500 bytes;

– around 50% of packets are 1500 bytes in length.

• around 40% of packets are as small as possible:

– these are TCP ACKs;

– recall, it is determined by headers of TCP (20 bytes), IP (20 bytes): 40 bytes;

– around 40% of packets are 40 bytes in length.

• around 10% of packet lengths are uniformly distributed between 40 and 1500;

• additional peaks: fragmentation of IP packets.



8.7. Non-uniform distribution of the traffic

Note: it is related to flows in the network!

Packet traffic is non-uniformly distributed:

• around 90% of traffic is between 10% of nodes:

– explanation: client-server configuration of most services.

• this property may change:

– p2p applications.



8.8. Unknown patterns of the packet traffic

What was suggested over decades:

• 80th: Poisson nature of the aggregated packet traffic:

– common agreement: this assumption is no longer valid!

• 90th: self-similar nature of the aggregated traffic:

– the most respected hypotheses today;

– seems a little bit strange.

• 2000: is it simply non-stationary?

– probably the correct answer:

– small timescales: stationary;

– long timescales: non-stationary.



8.9. Changing nature of Internet traffic

Internet traffic continuous to changes:

• what applications dominated over decades:

– 80 − 95: e-mail, remote access;

– 95−: WWW, large file transfers;

– predictions: 2010: p2p, WWW.

• how to deal with:

– you cannot rely upon ’old measurements’;

– new measurements are required.



8.10. Find more about Internet traffic

Where you may find more about current Internet traffic:

• Internet traffic archive: http://ita.ee.lbl.gov;

• Internet traffic report: http://www.internettrafficreport.com;

• National laboratory for applied network research (NLANR): http://www.nlanr.net;

• NLANR measurement and operations analysis team (MOAT): http://moat.nlanr.net;

• National Internet measurement infrastructure (NIMI): http://www.ncne.nlanr.net/nimi;

• tcpdump measurements software: http://www.tcpdump.org;

• etherial software: http://www.etherial.com;

• research papers:

– free search engine: http://researchindex.org/;

– free search engine: http://scholar.google.com/;

– ieee: http://ieeexplore.ieee.org/Xplore/guesthome.jsp.



9. What statistics to captureGeneral answer is not straightforward:

• what are the aims of traffic modeling:

– just propose a new, better traffic model?

– carry out performance evaluation?

– what kind of performance evaluation simulation/analytic?

• how close you are going to describe the traffic:

– trade-off between accuracy and complexity!

– is it sufficient just to get basic ideas?

– is there interest in precise parameters?

• what statistics are important:

– mean, variance, distribution, ACF?

– you can never say before you get results;

– you can never say before you capture a certain parameter.



9.1. Description of stochastic processes

Note the following for process {S(n), n = 0, 1, . . . }:• full information is given by N -dimensional distribution:

S(t1, s1, t2, s2, . . . ) = Pr{S(t1) ≤ s1, S(t2) ≤ s2, . . . }. (7)

– impossible to deal with;

– never considered in teletraffic theory.

• in most general case we operate with:

– empirical distribution (in terms of the histogram);

– autocorrelation function.

• note the following:

– distribution and ACF: does not fully describe arbitrary process;

– distribution and ACF: gives full description for processes with Normal distribution.



fi,E(�)

0 5 10 15 20 25 30 350

0.028

0.055

0.083

0.11

i�

(a) Empirical distribution

KY(i)

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

i, lag

(b) NACF

Figure 13: Empirical distribution and NACF with approximations.

It is advisable to construct both:

• you get a picture how distribution behaves;

• you get a picture how ACF behaves.



fX(x)

x

normal

exponential, hyperexponential

gamma, beta, Erlang, Weibull

Pareto

Figure 14: Forms of distribution.

Note: form may significantly affect results of performance analysis!



If distribution is hard to capture, capture moments:

• 1st moment: mean;

• 2nd moment: variance;

• 3rd moment: skewness;

• 4th moment: kurtosis;

• higher moments.

If ACF is hard to capture, capture:

• lag-1 ACF only;

• short-range behavior of ACF;

• long-range behaviour of ACF.



9.2. Importance of moments

fX(x)

xmean

variance

Figure 15: Mean and variance of distribution.

• mean: measure of central tendency;

• variance: measure of dispersion.



fX(x)

x

sk = 0sk < 0 sk > 0

Figure 16: Skewness of the distribution.

• skewness: normalized third central moment (for symmetric sk = 0);

• skewness: measure of the lopsidedness of the distribution.



fX(x)

x

kurt 1

kurt 2

kurt 2 > kurt 2

Figure 17: Kurtosis of the distribution.

• kurtosis: normalized fourth central moment - 3;

• kurtosis: whether the distribution is tall and skinny or short and squat compared to normal.



Notes on fitting moments:

• if 4 first moments are fitted you may expect fair approximation of histogram;

• sometimes it is easier to fit histogram than more than 2 moments.

fX(x)

x

Figure 18: Distribution to be approximated.



fX(x)

x

Figure 19: Fitting mean and variance results in a number of forms of a distribution.

• different skewness;

• different length on fX(x) axis.



fX(x)

x

Figure 20: Fitting mean, variance and skewness limits a number of forms of a distribution.

We still have differences:

• different length on fX(x) axis.



fX(x)

x

Figure 21: Fitting mean, variance, skewness and kurtosis may result in a desired distribution.

• if we fit kurtosis we are sure that the length on fX(x) axis is the same;

• 4 moments are fairly enough to get good fitting.



9.3. Importance of the ACF

Note the following:

• autocorrelation affects results of performance analysis:

The effect may not be straightforward

• autocorrelation in packet arrivals usually leads to more losses;

• autocorrelation in bit errors of the wireless channel may lead to less losses.

ACF manifests itself in two effects:

• short-range dependence:

– more losses and larger delays compared to no autocorrelation.

• long-range dependence:

– more losses and larger delays compared to short-range dependent models.



K(i)

i

short-range dependence

long-range dependence

i < 10-20 i > 50

Figure 22: Long and short-range dependence.

• short-range dependence: K(i) = 0 already for some i < 20 ∼ 30;

• long-range dependence: K(i) �= 0, i > 50.



i

single exponential/geometric component

power decay (long-range dependence)

two exponential/geometric components

K(i)

Figure 23: Common forms of the normalized ACF in traffic models.

• exponential/geometric decay: short-range dependence;

• power decay: long-range dependence.



i

mixture of exponential terms

1

( )k

i

j j

j

K i � ��

� �

K(i)

Figure 24: Power decay can be approximated by sum of exponentials (to the some extent).

• some models have such a kind of ACF;

• example: Markov modulated processes.



Note on long-range dependence:

• may exists as a consequence of non-stationarity!

• you have to be pretty sure in stationarity:

– anomaly behavior of ACF may be the sign of non-stationarity;

– example: ACF is not strictly decreasing.

– example: ACF is slowly decreasing.


• observations of stationary traffic usually have strictly decreasing ACF;

• however, note the following:

– some anomalies can be due to outbursts that may not be important;

– some anomalies are important.

• you have to have intuition!



Y(i)

0 25 50 75 1005

0

5

10

15

i, timeKY(i)

0 25 50 75 1001

0

1

i, lag

Y(i)

0 25 50 75 1000

5

10

15

i, timeKY(i)

0 25 50 75 1002

1

0

1

i, lag

Figure 25: Traces and NACFs of non-stationary observations.



Y(i)

0 25 50 75 1002

0

2

4

i, timeKY(i)

0 25 50 75 1000.5

0

0.5

1

i, lag

Y(i)

0 25 50 75 10010

0

10

20

i, timeKY(i)

0 25 50 75 1000.5

0

0.5

1

i, lag

Figure 26: Traces and NACFs of stationary observations.



9.4. What statistics are important?

Note the following:

• mean value:

– must be captured;

• variance:

– must be captured;

– one may use standard deviation or coefficient of variation instead.

• lag-1 ACF:

– was found to be important;

– may have unexpected effect.

• structure of the ACF:

– sometimes may affect significantly (e.g. long-range dependence).

• histogram of relative frequencies:

– captures all moments of one-dimensional distribution;

– required when you have to be pretty sure.



9.5. Example of importance of parameters

We consider: SBP+SPP/D/1/K queuing system:

• SBP: switched Bernoulli process;

• SPP: switched Poisson process;

• arrivals of SBP have priority over SPP;

• constant service time of one slot;

• K waiting positions;

• parameters of interest: pdf of waiting time fL(l), pdf of losses fQ(q).

Application: frame transmission process over wireless channels:

• SBP: frame error process;

• SPP: frame arrival process;

• limited buffer of the mobile terminal;

• single wireless channel.



Figure 27: Effect of the lag-1 autocorrelation coefficient of SPP.



Figure 28: Effect of the variance of SPP.



Figure 29: Effect of the form of the distribution of SPP.



Figure 30: Effect of the lag-1 autocorrelation coefficient of SBP.



Figure 31: Effect of the variance of SBP.



9.6. Common matching schemes

Common matching:

• mean and variance:

– usually easy to do;

– used to get mean performance parameters.

• mean, variance and lag-1 ACF:

– there are a number of models and algorithms;

– relatively easy to do.

• mean and ACF:

– there are a number of models and algorithms;

– sometimes not easy to do.

• histogram:

– one may look for analytical distribution;

– usually easy to do using discrete distribution.

• histogram and ACF.



9.7. Classes of models and characteristics

Classes of models:

• renewal class of models:

– distribution can be arbitrary;

– ACF is zero for all lags: no autocorrelation.

• autoregressive class:

– distribution is normal;

– ACF is a sum of exponential/geometric terms.

• Markov-modulated models:

– distribution can be arbitrary;

– ACF is a sum of exponential/geometric terms.

• models with self similar properties:

– distribution can be either normal (FBM) or arbitrary (F-ARIMA);

– ACF non-zero for large lags: long-range dependence.



9.8. Receipts

You may use the following when you are to capture:

• first two moments:

– Erlang, hyperexponential, exponential distributions;

– approximation by discrete distribution: p1, p2, . . . , pk such that∑

i pi = 1.

• first m moments:

– special case of phase-type distribution;

– approximation by discrete distribution.

• first two moments and lag-1 of ACF:

– Markov modulated processes;

– autoregressive processes.

• first two moments and ACF:

– Markov modulated processes;

– autoregressive processes.



10. Estimating statistics of traffic observationsWhat is special in teletraffic:

• usually we have enough statistics to estimate;

• that is, we can capture for days...

• see, Internet Traffic Archive: http://ita.ee.lbl.gov/

What does it mean:

• recall, unbiased and consistent estimate for variance:

σ2[X] =1

N − 1

N∑i=1

(Xi − m)2. (8)

– we may not care about 1/(N − 1) and just use 1/N when N is sufficiently large.



General questions you have to answer at this step:

• is the traffic process ergodic:

– practically, there are no means to test for ergodicity;

– look for reasons for ergodicity of the traffic process;

– ergodic: a single sufficiently long observation can be further used;

– not ergodic: a set of observations must be obtained.

• are there reasons for stationarity of the traffic process?

– practically, there are no means to test for stationarity;

• if stationary (first- or second-order):

– estimate important statistics;

• if not stationary:

– try to change representation of observations;

– another representation may be stationary.



11. Choosing a candidate modelInput information

• parameters of the traffic that have to be captured;

• a set of traffic models.

What you have to know:

• traffic models and their properties;

• analytical tractability of models:

– simulation: any model is suitable;

– analytical: only tractable models are suitable.

Examples:

• analytically tractable: renewal models, Markovian models;

• analytically intractable: most non-Markovian models.



11.1. Example of the problem

Assume we have:

• observations of RV X that is defined on [0,∞);

• we have to capture first two moments of observations:

E[X], C2

< 1. (9)

What one may guess:

• Erlang distribution (E2):

– defined on [0,∞);

– has C2 < 1.

• shifted exponential distribution:

– defined on [d,∞).

– has C2 < 1.

• what to choose?



fX(x)

xd

Shifted exp: fX(x) = be-b (x-d)

E2: fX(x) = bxe-bx

Figure 32: pdfs of shifted exponential and E2 distributions.

Conclusion:

• shifted exponential does not satisfy implicit requirement X ∈ [0,∞)

• we choose Erlang distribution;



12. Self-similar trafficNote the following:

• it is a common belief nowadays:

– may not be true.

• a way to deal with aggregated network traffic:

– may not be the only approach.

Simple example of deterministic self-similar behavior: Cantor set:

• take a certain subspace of the space (assume rectangle [0, 1][0, 1] in R2);

• scale its size by 1/3 and place in corners on initial subspace;

• do the same for each obtained rectangle;

• continue...

Note: self-similar structures are sometimes called fractals.



take 2 of length 1/3 and place in corners

arrivals

Figure 33: Illustration of the 2D Cantor set and 1D Cantor set as ON/OFF traffic.



arrivals

Figure 34: Weighted Cantor set with weights 2/3 and 1/3 (we preserve the whole weight).



Figure 35: Coast of England is an example of fractals: length scales with a timescale.



Figure 36: Stochastic self-similarity in the network traffic.



12.1. Definition of self-similarity

Assume we are working with:

• {Y (t), t = 0, 1, . . . }: cumulative arrival process:

– may not be stationary!

• {X(t), t = 0, 1, . . . }, X(t) = Y (t + 1) − Y (t): process of increments:

– first order difference process: X(t) = ∇Y (t) (recall ARIMA);

– this process is should be covariance stationary with zero mean.

Y(t) X(t)

0 2 4 6 8 1040

20

0

20

40

60

0 2 4 6 8 1050

0

50

t, time t, time

Figure 37: Example of processes {Y (t), t = 0, 1, . . . } and {X(t), t = 0, 1, . . . }.



Define aggregated averaged process of {X(t), t = 0, 1, . . . } as:

X(m)(n) =1

m(Xnm−m+1 + · · · + Xnm) =

1

m

mn∑t=m(n−1)+1

X(t). (10)

t

t

t

{X(n),n = 0,1,..}

{X(5)

(n),n = 5,10,..}

{X(10)

(n),n = 10,20,..}

Figure 38: Averaging the process {X(t), t = 0, 1, . . . }.

Note: X(m)(n) is the sample mean of the sequence (Xnm−m+1 + · · · + Xnm).



A process {X(t), t = 0, 1, . . . } is exactly second-order self-similar if:

γ(k) =σ2

2[(k + 1)2H − 2k2H + (k − 1)2H ], k = 1, 2, . . . , (11)

• γ(k) is ACF of {X(t), t = 0, 1, . . . };• such structure implies that γ(k) = γ(m)(k) for all m ≥ 1;

• H is called Hurst parameter;

• for self-similar processes 0.5 < H < 1.

A process {X(t), t = 0, 1, . . . } is asymptotically second-order self-similar if:

γ(m)(k) = limm→∞

σ2

2[(k + 1)2H − 2k2H + (k − 1)2H ], k = 1, 2, . . . . (12)

• γ(m)(k) is ACF of {X(m)(n), n = 0, 1, . . . }.Note the following:

• exactly self-similar: correlation structure is strictly preserved over timescales;

• asymptotically self-similar: correlation structure is preserved under time-aggregation.



12.2. Self-similarity in terms of distribution

Consider the following:

• continuous time process {Y (t), t ∈ };

Process {Y (t), t ∈ } is self-similar with certain 0 < H < 1 if:

Y (t) = a−HY (at), a > 0, t ≥ 0. (13)

• it means: {Y (t), t ∈ } and {Y (at), t ∈ } normalized by a−H have the same distribution;

• we usually interpret {Y (t), t ∈ } as cumulative arrival function.

What is important:

• {Y (t), t ∈ } cannot be stationary du to normalization factor a−H !

• its increment process can be (does not mean it must) covariance stationary.



12.3. Long range dependence

Determine variance of {X(m)(n), n = 0, 1, . . . } via variance of {X(t), t = 0, 1, . . . }:

σ2[X(m)] = E2

[1

m(Xnm−m+1 + · · · + Xnm)

]−(

E1

m[Xnm−m+1 + · · · + Xnm]

)2

=

=σ2[X]

m+

2

m2

m∑k=1

(m − k)r(k) =

= σ2[X]

[1 + 2

m∑k=1

(1 − k

m

)r(k)

]m−1. (14)

• r(k), k = 1, 2, . . . is the NACF of {X(t), t = 0, 1, . . . }.Note: if process {X(n), n = 0, 1, . . . } is uncorrelated, {X(n)(m), n = 0, 1, . . . } is uncorrelated.

r(k) = 0, k = 1, 2, . . . , σ2[X(m)] = D[X]m−1. (15)

Note: if process {X(n), n = 0, 1, . . . } is correlated, then for large m we have:

σ2[X(m)] = σ2[X]

(2

m∑k=1

r(k)

)m−1. (16)



Consider the case when r(k) �= 0 and∑∞

k=−∞ r(k) < ∞:

σ2[X(m)] = σ2[X]Cm−1, (17)

• C is some constant;

• sample variances decay to zero as fact as m−1.

Models that MAY have such behavior:

• Markovian models;

• ARMA(p, q) and its special cases (MA(q), AR(p)).

Where sample variance may decay at slower rate than m−1:

• {X(t), t = 0, 1, . . . } is self-similar process;

• {X(t), t = 0, 1, . . . } is non-stationary process.

Where practically: aggregated traffic.



How to model slower than m−1 decay?

• one should have decay of σ2[X(m)] proportional to m−a, a ∈ (0, 1), why?

– if a = 1 we have limited serial correlation or no correlation at all;

– if a = 0 the process degenerates to constant case.

• it requires the sum in expression for σ2[X(m)] must be proportional to m1−a:

m∑k=1

r(k) = Cm1−a, a ∈ (0, 1). (18)

When α < 1, previous implies that the ACF decays so slowly that it is not summable:

∞∑k=−∞

r(k) → ∞. (19)

An example of such ACF is a power decaying ACF:

r(k) ∼ Ck−α, α ∈ (0, 1). (20)

• in this case we say that ACF decays as power function.



KX(i)

0 20 40 60 80 1000.2

0

0.2

0.4

0.6

0.8

empirical ACF

error: +2/sqrt(n)

error: -2/sqrt(n)

i, time

KX(i)

0 20 40 60 80 1000.2

0

0.2

0.4

0.6

0.8

empirical ACF

error: +2/sqrt(n)

error: -2/sqrt(n)

i, time

Figure 39: NACFs of short-range dependent and long-range dependent processes.



12.4. Values of H

We have the following possibilities:

• H = 0.5: process is completely uncorrelated,∑∞

k=−∞ r(k) is finite;

• 0 < H < 0.5:∑∞

k=−∞ r(k) = 0 that is rarely observed in applications;

• H = 1 leads to r(k) = 1, k = 1, 2, . . . there is linear dependence in the series;

• H > 1: prohibited due to stationarity exposed on {X(t), t = 0, 1, . . . }.

Self-similarity and long-range dependence:

• there are long-range dependent processes that are not self-similar;

• there are self-similar processes that are not long-range dependent;

• asymptotic self-similar:

– self-similarity leads to long-range dependence;

– long-range dependence leads to self-similarity.



12.5. Heavy-tailed distributions

Observe the following:

• heavy-tailed distribution is when CPDF is:

Pr{Z > z} ≈ x−a. (21)

– 0 < a < 2 is some constant.

– tails of distribution decreases slowly.

• short-tailed distribution is when CPDF is:

Pr{Z > z} ≈ e−z. (22)

– tails of distribution decreases quickly.

Mote the following:

• 0 < a < 1: infinite mean, infinite variance;

• 1 < a < 2: infinite variance.

We are interested: 1 < a < 2 when only variance is infinite.



Common heavy-tailed distribution is Pareto:

Pr{Z ≤ z} = 1 −(a

x

)a

, b ≤ x, (23)

• 0 < a < 2 is the scale parameter;

• b is the location parameter;

• mean is given by:

E[Z] =ab

a − 1. (24)

• variance is infinite.

Note the following:

• gamma distribution has subexponential tail but has a finite variance;

• weibull distribution has subexponential tail but has a finite variance.

Note: main characteristic is high variability:

• may take very large values with non-negligible probabilities;

• sample: a lot of small values and some values are extremely big.



fY(y)

50 8.33 33.33 75 116.67 158.33 2000

0.01

0.02

0.03

0.04

Weibull distributiony

fY(y)

50 8.33 33.33 75 116.67 158.33 2000

0.01

0.02

0.03

0.04

Normal distribution

Exponential distributiony

Figure 40: Examples of short-tailed and heavy-tailed distributions.



12.6. Heavy-tails and predictability

Assume the following:

• we have duration of a lifetime of a certain thing with RV Z;

• time is discrete t = 0, 1, . . . ;

• {A(t), t = 0, 1, . . . } is an indicator process:

– A(t) = 1 something is in;

– A(t) = 0 something already expired.

• we are interested in: something still persists in the future given that it persists now:

U(τ) = Pr{A(τ + 1) = 1|A(τ) = 1}, 1 ≤ t ≤ τ. (25)

We can express U(τ) as:

U(τ) = 1 − Pr{Z = τ}Pr{Z ≥ τ} . (26)



Assume short-tailed distribution in the form Pr{Z > x} ≈ c1e−c2x:

U(τ) = 1 − Pr{Z = τ}Pr{Z ≥ τ} ≈

≈ 1 − c1e−c2τ − c1e

−c2(τ+1)

c1e−c2τ= 1 − (1 − e−c2) = e−c2 . (27)

• prediction is the same for all τ !

• recall exponential distribution.

Assume long-tailed distribution:

U(τ) = 1 − Pr{Z = τ}Pr{Z ≥ τ} ≈

≈ 1 − cτ−a − c(τ + 1)−a

cτ−a= 1 −

(1 −

(τ

τ + 1

)a)=

(τ

τ + 1

)a

. (28)

• prediction is different for all τ !

• when τ → ∞, U(τ) → 1!



12.7. Heavy-tails as a cause of LRD

Let us introduce the following:

• FBM: fractional Brownian motion:

– non-stationary Gaussian process with 0 < H < 1.

• FGN: fractional Brownian noise.

– stationary increment process of FBM with 0 < H < 1;

– distribution is also Gaussian.

Note that when H = 0.5:

• FBM reduces to Brownian motion:

– also non-stationary process.

• FGN reduces to Gaussian noise:

– stationary process;

– completely uncorrelated.

Note: for Gaussian processes distributional and second-order self-similarity are equivalent!



Consider the following:

• N independent on/off sources Xi(t), i = 1, 2, . . . , N ;

• aggregated process SN(t) =∑N

i=1 X(i)(t).

Figure 41: Examples of on/off sources and their aggregation.



Consider cumulative process YN(Tt):

YN(Tt) =

∫ Tt

0

(N∑

i=1

X(i)(s)

)ds. (29)

• T > 0 is a scale factor.

If the following holds:

Pr{τon > x} ≈ cx−a, x → ∞, 1 < a < 2, c > 0, (30)

For large T and N process YN(Tt) behaves as:

E[τon]

E[τoff ] + E[τon]NTt + CN1/2THBH(t) (31)

• H = (3 − a)/2 is Hurst parameter;

• BH(t) is FBM with Hurst parameter H;

• C > 0 is a constant depending on distributions of τon and τoff ;

• distribution of the off times can be arbitrary (short-tailed or heavy-tailed).



What we can say about YN(Tt):

• long range dependent (0.5 < H < 1) if 1 < a < 2: on intervals are heavy-tailed:

– distribution of off intervals does not matter.

• long range dependent (0.5 < H < 1) if 1 < a < 2: off intervals are heavy-tailed:

– distribution of on intervals does not matter.

• if off and on intervals are short-tailed it is short-range dependent:

– heavy-tails are required to have self-similarity.

Practice:

• file sizes distribution has heavy-tail;

• generator: web site with downloading capabilities.



12.8. Estimating H: variance-time plot

What property we are going to use:

• self-similar process has a slowly decaying variances with increasing of m:

σ2[X](m) = σ2[X]m−β. (32)

• where H = 1 − β/2.

You have to do the following:

• determine several m (it is better to use m = 1, 10, 100, 1000, . . . );

• compute log10 σ2[X(m)/m] and log10 m for each m;

• ignore small values of m;

• fit a least squares fit line through a rest of resulting points in the plane;

• estimate the Hurst parameters as:

H = 1 − β

2(33)

– where β is the value of estimated asymptotic slope.



0 1 2 3 4 55

4

3

2

1

0

log10(�2[X(m)/m])

log10(m)

KX(i)

0 20 40 60 80 1000.2

0

0.2

0.4

0.6

0.8

empirical ACF

error: +2/sqrt(n)

error: -2/sqrt(n)

i, time

Figure 42: Example of variance-time plot and ACF of self-similar process (H ≈ 0.82).



KX(i)

0 20 40 60 80 1000.2

0

0.2

0.4

0.6

0.8

empirical ACF

error: +2/sqrt(n)

error: -2/sqrt(n)

i, time0 1 2 3 4 5

4

2.4

0.8

0.8

2.4

4

log10(�2[X(m)/m])

log10(m)

Figure 43: Example of variance-time plot and ACF of a process without self-similarity (H ≈ 0.48).



12.9. Estimating H: R/S statistics

What we use here:

• R/S (rescaled/adjusted range) statistics:

– practically, measure of decline of variance.

• R/S statistics for self-similar process:

limn→∞

E[R(n)/S(n)] ≈ cnH . (34)

• estimate of H: slope of log-log plot of R/S statistics.

How to compute R/S statistics: for each value d = 10, 20, 30, . . . do:

• compute K points of R/S statistics R(ti, d)/S(ti, d);

• starting points must satisfy (ti − 1) + d ≤ N ;

• estimate H as the slope of log-log graph of R/S statistics.



Figure 44: Algorithm to compute R/S statistics.



log(R/S)

logd

Figure 45: Estimating Hurst parameter using R/S statistics (H ≈ 0.9).



12.10. Caution!

Note the following:

• self-similar processes:

– cumulative process {Y (t), t = 0, 1, . . . } may not be stationary;

– increment process {X(t), t = 0, 1, . . . }, Xt = Yt+1 − Y (y) must be stationary;

– note: increment process is just fist difference process.

• some traffic seems to be non-stationary at all:

– deterministic variations: hourly, daily (recall PSTN traffic)!

Stationarity of the process:

• in real traffic depends of the timescale at which traffic is measured;

• recent hypothesis:

– short timescales: stationary behavior;

– long timescales: non-stationary behavior.

Another problem: distinguishing between self-similarity and non-stationarity.



Illustrative example:

• we generate process segments of which have different mean and variance;

• is this process self-similar?

– NO: both {Y (t), t = 0, 1, . . . } and X(t) = Y (t + 1) − Y (t) are non-stationary!

– can you say that observing only left figure?

Figure 46: First 5E5 and 1E4 observations of non-stationary trace.



What people usually do:

• estimate Hurst parameter or NACF;

• incorrect conclusion about self-similarity and non-stationarity!

KX(i)

0 20 40 60 80 1000.2

0

0.2

0.4

0.6

0.8

empirical ACF

error: +2/sqrt(n)

error: -2/sqrt(n)

i, time0 1 2 3 4 5

3

1.8

0.6

0.6

1.8

3

log10(�2[X(m)/m])

log10(m)

Figure 47: NACF and variance-time plot for non-stationary trace (H ≈ 0.76!!!).



13. Fit parameters of modelsNote the following:

• no general algorithms;

• algorithms are specific for a class of models;

• there could be more than a single algorithm for a chosen model;

• there could be no algorithm for a chosen model.

General procedure:

• determine parameters of the model:

– these parameters must completely characterize a model;

– not only parameters you are going to capture.

• derive equation relating measuring statistics and parameters:

– note that some parameters can be free.



14. Tests accuracy of the modelNote the following:

• sometimes is not needed:

– when you exactly match parameters.

• sometimes is needed:

– when approximation is used at a certain step.

Tests:

• compare distribution and empirical data:

– χ2 test;

– Smirnov’s test.

• compare autocorrelations:

– just visually comparing;

– Q-Q graph.



15. ExampleWhat we have to do:

• propose a model of the aggregated traffic;

• capture histogram and ACF as close as possible;

• model should be further used in simulation study.

network side

1..

.

Figure 48: The point at which traffic is to be modeled.



15.1. Measuring the traffic at the point of interest

We carried out two sufficiently long measurements:

• reality: 2000 observations may not sufficiently long!

• disclaimer: these observations do not represent real traffic of any kind!

Y(i)

0 500 1000 1500 20000

6

12

18

24

30

i

(a) Experiment 1

Y(i)

0 500 1000 1500 20000

6

12

18

24

30

i

(b) Experiment 2

Figure 49: Traffic observations obtained in two experiments.



15.2. Estimating statistics

What you may guess?

• are they stationary ergodic?

• what kind of distribution these traces come from?

• is the same approximating distribution the same for both traces?

• which model to use to capture statistics?

What to do to get basic knowledge:

• compute statistics;

• analyze statistics to identify properties.

What statistics we usually start in MODELING:

• histogram of relative frequencies;

• normalized autocorrelations function.



Histograms looks like as follows:

• are they really normal?

• testing using χ2: yes with level of significance α = 0.1!

fi,E(�)

0 5 10 15 20 25 30 350

0.028

0.055

0.083

0.11

i�

(a) Experiment 1

fi,E(�)

0 5 10 15 20 25 30 350

0.028

0.055

0.083

0.11

i�

(b) Experiment 2

Figure 50: Histograms of presented traces with normal approximations.



NACFs look like as follows:

• we have no anomalies;

• such NACFs are inherent for stationary processes.

KY(i)

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

i, lag

(a) Experiment 1

KY(i)

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

i, lag

(b) Experiment 2

Figure 51: Normalized ACFs of presented traces with geometric approximations.



15.3. Choosing a candidate model

What are our observations:

• observations are stationary ergodic: assumption;

• empirical distribution is normal;

• ACF is distributed according to a single exponential/geometric term.

Which model to guess:

• autoregressive model or order 1: AR(1):

Y (n) = φ0 + φ1Y (n − 1) + ε(n), n = 1, 2, . . . , (35)

– φ0 and φ1 are some parameters, ε ∼ N(0, σ2);

– marginal distribution is normal, NACF K(i) = φi1, i = 0, 1, . . . .

• Markov modulated model:

– may approximate Normal distribution;

– NACF is a sum of exponential/geometric terms.



15.4. Fitting AR(1) model

What we have to do: estimate the following:

φ0, φ1, σ2[ε]. (36)

Properties of AR(1) model:

• if AR(1) process is covariance stationary we have:

E[Y ] = µY , σ2[Y ] = γY (0), Cov(Y0, Yi) = γY (i). (37)

• µY , σ2[Y ] and γY (i) of AR(1) are related to φ0, φ1 and σ2[ε] as

µY =φ0

1 − φ1

, σ2[Y ] =σ2[ε]

1 − φ21

, γY (i) = φi1γY (0). (38)

• φ0, φ1 and σ2[ε] are related to statistics of observations as:

φ1 = KX(1), φ0 = µX(1 − φ1), σ2[ε] = σ2[X](1 − φ21), (39)

– KX(1), µX and σ2[X] are the lag-1 value of ACF, mean and variance of observations.



15.5. Testing for accuracy of fitting

Why we need it:

• we were asked to capture histogram and NACF;

• we fit only first two moments and lag-1 value of ACF!

Is there a case when we need not to do testing:

• assume we were to capture only µX , σ2[X] and KX(1);

• since we explicitly fit them, AR(1) model exactly represents them.

What allows us to assume we get fair approximation:

• AR(1) model is characterized by only three parameters that all were matched;

• distribution of AR(1) model is normal;

• NACF of AF(1) models is geometrically distributed.



The first step:

• generate trace from the model:

– for simplicity you may generate exactly the same amount of observation.

Test histograms using χ2 or Smirnov’s test for two samples:

• first sample: empirical observations;

• second sample: generated from model;

• hypotheses to be tested:

– H0: distributions of two samples are the same;

– H1: distributions of both samples are different.

Test NACFs:

• you may carry out visual test by plotting NACFs of both samples;

• you may test for significant correlation using Box-Ljiung statistics.


traﬃc concept, measurements, statisticstraﬃc modeling d.moltchanov, tut, 2005 n(t), number of...

Documents