[ieee africon 2007 - windhoek, south africa (2007.10.26-2007.10.28)] africon 2007 - an adjusted...

6
An Adjusted ARIMA Model for Internet Traffic Huda M. A. El Hag Faculty of Mathematical Sciences University of Khartoum Sudan Email: [email protected] Sami M. Sharif Faculty of Engineering and Architecture University of Khartoum Sudan Email: [email protected] Abstract— Traditional time series models such as ARIMA models have been proven to be inadequate for modelling traffic exhibiting long-range dependance. In this paper we present a new model the Adjusted ARIMA model for modelling long-range dependant Internet traffic. The AARIMA model is suggested to give a quick and simple way to model Internet traffic by retaining all the properties of the ARIMA models while capturing the self- similarity. We use the Box-Jenkins methodology as a frame work for our modelling procedure. We construct our model by building the best ARIMA model possible for a trace and then adding our adjustment to obtain the equivalent AARIMA model. We show that the AARIMA model shows an evident improvement over the ARIMA model using several goodness of fit criteria, our main goodness of fit criteria is the ability to capture the Hurst parameter of the original trace being modeled. The model should not underestimate the Hurst parameter and any overestimation should be less than or equal to 20% of H parameter of the measured trace. The Adjusted ARIMA model is shown to accurately predict Internet traffic for up to one hour in advance. The adjustment we propose to the ARIMA model is by introducing a feedback term made up of the first difference of the series being modeled. We used four Hurst parameter estimators to measure the self- similarity of the measured traces and both the AARIMA and ARIMA models for all measured traces. For all the estimators used the AARIMA was found to capture the long-range depen- dence irrespective of estimator used. We used the Adjusted ARIMA model to predict three Public Domain Internet traffic traces namely a Bellcore Internet Wide Area Network External traffic trace (length 35 hours), a Bellcore Internet Wide Area Network ”purple cable” trace (length half an hour) and a MPEG-1 compressed video traffic trace (length half an hour). We show that for the Public Domain traces the AARIMA model gives values of H parameter which are more accurate than those given by the ARIMA model. I. I NTRODUCTION Internet traffic can be described by the joint characteris- tics of self-similarity and long-range dependence [1]. Self- similarity means that traffic measured in a certain time interval looks like traffic measured in another appropriately scaled interval. A lot of research was done in the measurement and analysis of network traffic. These studies address both local area networks and wide area networks. For local area networks [2] [3] and wide area networks [4][5], it is shown that network traffic is self-similar and long-range dependant in nature [6][7][8]. The Hurst parameter measures the degree of self-similarity, and has values in the range 1/2 <H< 1, the larger the value of H, the more self-similar the traffic is. To determine the Hurst parameter of time series data several estimators can be used. The estimators are Re-scaled Range analysis [9], Periodogram method [10][11], Aggregate Variance method [9], Whittle estimator [12], Absolute Moments method [11], Variance of Residuals method [13], and Abry-Veitch method [14][15][16][17][18][19]. To properly model Internet traffic a through understanding of the characteristics of network traffic is needed [20]. These characteristics can be obtained by measurement and analysis of actual network traffic. Traffic models which can predict future traffic and are simple and effective are needed to succeed. This paper introduces a new model the Adjusted ARIMA Model (AARIMA) for modelling Internet traffic data at millisecond time scales. The AARIMA model is used to model real Internet traffic traces, we evaluate the models by comparing the predictions to the actual traces. Our main goodness of fit criteria is the Hurst parameter. We use some other goodness of fit statistics to compare the AARIMA models to the equivalent ARIMA models. Our procedure for modelling can be summarized as fol- lows. We divide each measured trace into two parts before modelling, the first 75% - 80% of the trace we use to build the models and the remaining 20% - 25% we use to validate the models. The first step in building an AARIMA is testing the trace y t for stationarity, if the trace is not stationary we difference to obtain a stationary series y t , the best ARIMA model is then obtained for the stationary series y t or the differenced series y t . The ARIMA model is then adjusted to obtain the AARIMA model by adding a feedback term which consists of 0.5* [the first difference of the differenced trace] this gives {0.5*(y t ) =0.5*2 y t } for the non- stationary trace. For the stationary trace the AARIMA model is obtained by adding a feedback term which consists of 0.5* [the first difference of the trace] this gives {0.5*(y t )}. This procedure enabled us to compare each AARIMA model with the corresponding ARIMA model. The Public Domain traces were all stationary. Four Hurst parameter estimators: Periodogram method, Ag- gregate Variance method, Whittle estimator and Variance of Residuals method were used to find the H parameters of the AARIMA and ARIMA models and the measured traces. We then obtained the bias in H parameter for each model relative to its corresponding measured trace. For all the estimators used 1-4244-0987-X/07/$25.00 ©2007 IEEE.

Upload: sami-m

Post on 05-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE AFRICON 2007 - Windhoek, South Africa (2007.10.26-2007.10.28)] AFRICON 2007 - An adjusted ARIMA model for internet traffic

An Adjusted ARIMA Model for Internet TrafficHuda M. A. El Hag

Faculty of Mathematical SciencesUniversity of Khartoum

SudanEmail: [email protected]

Sami M. SharifFaculty of Engineering and Architecture

University of KhartoumSudan

Email: [email protected]

Abstract— Traditional time series models such as ARIMAmodels have been proven to be inadequate for modelling trafficexhibiting long-range dependance. In this paper we present anew model the Adjusted ARIMA model for modelling long-rangedependant Internet traffic. The AARIMA model is suggested togive a quick and simple way to model Internet traffic by retainingall the properties of the ARIMA models while capturing the self-similarity.

We use the Box-Jenkins methodology as a frame work forour modelling procedure. We construct our model by buildingthe best ARIMA model possible for a trace and then addingour adjustment to obtain the equivalent AARIMA model. Weshow that the AARIMA model shows an evident improvementover the ARIMA model using several goodness of fit criteria, ourmain goodness of fit criteria is the ability to capture the Hurstparameter of the original trace being modeled. The model shouldnot underestimate the Hurst parameter and any overestimationshould be less than or equal to 20% of H parameter ofthe measured trace. The Adjusted ARIMA model is shownto accurately predict Internet traffic for up to one hour inadvance. The adjustment we propose to the ARIMA model isby introducing a feedback term made up of the first differenceof the series being modeled.

We used four Hurst parameter estimators to measure the self-similarity of the measured traces and both the AARIMA andARIMA models for all measured traces. For all the estimatorsused the AARIMA was found to capture the long-range depen-dence irrespective of estimator used.

We used the Adjusted ARIMA model to predict three PublicDomain Internet traffic traces namely a Bellcore Internet WideArea Network External traffic trace (length 35 hours), a BellcoreInternet Wide Area Network ”purple cable” trace (length halfan hour) and a MPEG-1 compressed video traffic trace (lengthhalf an hour). We show that for the Public Domain traces theAARIMA model gives values of H parameter which are moreaccurate than those given by the ARIMA model.

I. INTRODUCTION

Internet traffic can be described by the joint characteris-tics of self-similarity and long-range dependence [1]. Self-similarity means that traffic measured in a certain time intervallooks like traffic measured in another appropriately scaledinterval. A lot of research was done in the measurementand analysis of network traffic. These studies address bothlocal area networks and wide area networks. For local areanetworks [2] [3] and wide area networks [4][5], it is shownthat network traffic is self-similar and long-range dependantin nature [6][7][8].

The Hurst parameter measures the degree of self-similarity,and has values in the range 1/2 < H < 1, the larger the

value of H , the more self-similar the traffic is. To determinethe Hurst parameter of time series data several estimatorscan be used. The estimators are Re-scaled Range analysis[9], Periodogram method [10][11], Aggregate Variance method[9], Whittle estimator [12], Absolute Moments method [11],Variance of Residuals method [13], and Abry-Veitch method[14][15][16][17][18][19].

To properly model Internet traffic a through understandingof the characteristics of network traffic is needed [20]. Thesecharacteristics can be obtained by measurement and analysis ofactual network traffic. Traffic models which can predict futuretraffic and are simple and effective are needed to succeed. Thispaper introduces a new model the Adjusted ARIMA Model(AARIMA) for modelling Internet traffic data at millisecondtime scales. The AARIMA model is used to model realInternet traffic traces, we evaluate the models by comparingthe predictions to the actual traces. Our main goodness of fitcriteria is the Hurst parameter. We use some other goodness offit statistics to compare the AARIMA models to the equivalentARIMA models.

Our procedure for modelling can be summarized as fol-lows. We divide each measured trace into two parts beforemodelling, the first 75% − 80% of the trace we use to buildthe models and the remaining 20%− 25% we use to validatethe models. The first step in building an AARIMA is testingthe trace yt for stationarity, if the trace is not stationary wedifference to obtain a stationary series ∇yt, the best ARIMAmodel is then obtained for the stationary series yt or thedifferenced series ∇yt. The ARIMA model is then adjustedto obtain the AARIMA model by adding a feedback termwhich consists of 0.5∗ [the first difference of the differencedtrace] this gives {0.5*∇(∇yt) =0.5*∇2yt} for the non-stationary trace. For the stationary trace the AARIMA modelis obtained by adding a feedback term which consists of 0.5∗[the first difference of the trace] this gives {0.5*(∇yt)}. Thisprocedure enabled us to compare each AARIMA model withthe corresponding ARIMA model. The Public Domain traceswere all stationary.

Four Hurst parameter estimators: Periodogram method, Ag-gregate Variance method, Whittle estimator and Variance ofResiduals method were used to find the H parameters of theAARIMA and ARIMA models and the measured traces. Wethen obtained the bias in H parameter for each model relativeto its corresponding measured trace. For all the estimators used

1-4244-0987-X/07/$25.00 ©2007 IEEE.

Page 2: [IEEE AFRICON 2007 - Windhoek, South Africa (2007.10.26-2007.10.28)] AFRICON 2007 - An adjusted ARIMA model for internet traffic

the AARIMA was found to be very robust with respect to allthe Hurst parameter estimators used.

Three goodness of fit statistics namely: the Akaike Infor-mation Criterion (AIC), the Standard Error of the Regression(SER) and the Mean Absolute Error (MAE) were used to com-pare the AARIMA and ARIMA models to the measured traces.In all the traces the AARIMA model showed an improvementin all the criteria which are used to estimate goodness of fit.In both the (SER) and the (MAE) the improvement was 50%.

The rest of the paper is organized as follows. Section IIreviews related work. Section III presents the AutoregressiveIntegrated Moving Average (ARIMA) process which is thebase for our model. Section IV introduces the new model theAdjusted ARIMA(p,d,q) Model and describes how it can beimplemented to model Internet traffic using the Box JenkinsMethodology. Section V compares the AARIMA and ARIMAmodels using several goodness of fit criteria and Quantile-Quantile plots, in addition to comparing both the models withthe measured traces using the Hurst parameter as a goodnessof fit measure.

II. RELATED WORK

Several authors [21][22][23][24][25] have used time seriesanalysis to model and predict internet traffic. Some usedshort-range dependent models such as (Autoregressive Inte-grated Moving Average) ARIMA and (Autoregressive MovingAverage) ARMA processes, while others used long-rangedependent models such as the (Fractional Autoregressive Inte-grated Moving Average) FARIMA models [26][27] for internettraffic.

The authors in [23] used time series analysis to createdetailed forecasts of future backbone traffic. The resultingARIMA model produced accurate forecasts of traffic levelsup to a year in advance. The model can made reasonablepredictions for two or more years into the future, suggestingthat ARIMA modelling has great promise as a tool for long-range forecasting and planning. However due to the fastgrowth of the internet in term of number of users, it isinsufficient to make one network wide forecast for the wholebackbone. In [28] ARCH-based traffic forecasting was usedfor periodically measured nonstationary traffic. The authors in[21] used a non-linear time series model namely the thresholdautoregressive (TAR) [29] model to model internet trafficafter filtering the non-stationary TCP applications from thetraffic, however filtering part of the traffic does not give atrue representation of actual internet traffic. The authors in[25] introduced a model for long term capacity planning, themethodology relies on wavelet multiresolution analysis (MRA)[30][31] and time series models. Using the wavelet MRA anoverall long-term trend is identified, the fluctuations whichintroduce the largest variability in the trend are also identified.After identifying the components (trend and fluctuations), theauthors showed that the traffic can be accurately predicted upto six months ahead.

III. AUTOREGRESSIVE INTEGRATED MOVING AVERAGE

(ARIMA) PROCESSES

An Autoregressive Moving Average process ARMA(p, q) isdefined by [32]:

yt =p∑

i=1

φiyt−i +q∑

s=0

θsεt−s (1)

Where yt is the time series, εt is a white noise process,yt−i, i = 1, · · · , p are time lagged values of the time seriesand φ1 · · ·φp, θ1 · · · θq are constants, (1) can be rewritten as:

A(B)yt = C(B)εt (2)

A(B) = 1− α1B − α2B2 − · · · − αpB

p

C(B) = 1− β1B − β2B2 − · · · − βqB

q(3)

A(B) and C(B) are p-order polynomials in B, |α1| <1, · · · , |αp| < 1 and |β1| < 1, · · · , |βq| < 1, and where B isthe lag operator which gives the previous value of the serieswhen placed in front of any variable with a time subscript:

B(xt) = xt−1

(1−B)xt = xt − xt−1 = ∆xt

∆ is known as the difference operator. For stationarity, theroots of A(B) should lie outside the unit circle and forinvertibility, the roots of C(B) should lie outside the unitcircle. This requirement - that the roots of both A(B) andC(B)) lie outside the unit circle, ensures that values of yt−j

and εt−j in the distant past have very little effect on yt asj →∞.

When an ARMA(p, q) series is non-stationary, it needs tobe differenced at least once to produce a stationary series.The resultant series is an Autoregressive Integrated MovingAverage ARIMA(p, d, q) series. The parameters are:• p is the order of the AR component• q is the order of the MA component• d is the number of differences required to give a station-

ary seriesThe ARIMA(p, d, q) series can be defined by the followingequation, assuming that d roots lie in the unit circle [33]:

A(B)∇dyt = C(B)εt (4)

where ∇d = (1−B)d is the dth power of the differencingoperator. The differenced series wt = ∇dyt is a stationaryARMA(p, q) series [34][35][36][37].

IV. THE ADJUSTED ARIMA(P,D,Q) MODEL

The Public Domain traces are the Star Wars [38] MPEG-1, the Bellcore traces [2][39], BC-Oct89Ext and BC-pOct89.The trace BC-Oct89Ext has a length of about 122797.83seconds, The trace BC-pOct89 purple cable has a lengthof about 1759.62 seconds, both traces measured Ethernetpacket arrivals. The trace Star Wars MPEG-1 consisted of40000 frames total length about 1800 seconds. The number ofobservations used for the model and for the simulated traffictraces are shown in Table I.

Page 3: [IEEE AFRICON 2007 - Windhoek, South Africa (2007.10.26-2007.10.28)] AFRICON 2007 - An adjusted ARIMA model for internet traffic

TABLE I

DATA SET SPECIFICATIONS

Trace # Total Time # Model # SimulationObs. Interval(sec) Obs. Obs.

BC-pOct89 1000000 1759.62 800000 200000BC-Oct89Ext 1000000 122797.83 800000 200000

Star Wars 40000 1800 30000 10000

In this section, a new model the Adjusted ARIMA(p, d, q)or AARIMA(p, d, q) model is introduced. The model is anARIMA(p, d, q) model with the first difference of the ”sta-tionary” series added as a regressor. In modelling measuredInternet traffic time series it is observed that the ARIMAmodel gives a model with low goodness of fit statisticsalthough the residuals follow a white noise distribution. TheAARIMA model is suggested to give a quick and simple wayto model internet traffic by retaining all the properties of theARIMA models such as white noise residuals, autocorrelationfunction (ACF) and partial autocorrelation function (PACF).All the goodness of fit statistics used for the ARIMA modelcan be used for the AARIMA model. The equation is:

yt = 0.5zt + AR(terms) + MA(terms) (5)

where zt = (1−B)yt

There are several methods for time series analysis andmodelling but the most popular and systematic method is theBox-Jenkins methodology [33]. The Box-Jenkins methodologyis used to build models of time series in a sequence of stepswhich are repeated until the optimum model is achieved. Inthe Adjusted ARIMA model the model is built using the samesteps followed in the Box-Jenkins approach to obtain the bestARIMA model. We then add the feedback adjustment to obtainthe AARIMA model and the long-range dependence validationtest to measure the ability of the forecasted series to capturethe long-range dependence of the original trace. The steps inthe our analysis can be summarized as follows:

• Testing for stationarity of the time series.• Differencing to obtain a stationary series if the series is

not stationary.• Identifying the order of the AR component and the

MA component from the autocorrelation plots of thestationary series.

• Adding the feedback term.• Estimating the model parameters.• Forecasting to obtain future unknown values of the time

series.• Long-Range dependence validation: the ability of the

forecasted series to capture the long-range dependenceof the original trace is tested.

V. RESULTS

In this section the ARIMA and AARIMA models are com-pared. Table II shows the comparison of several ARIMA ANDAARIMA models for the observed series. It is evident that in

TABLE II

AARIMA AND ARIMA MODEL PARAMETERS FOR MEASURED TRACES

Trace Model AIC SER% MAE Iterations

BCOct89Ext AARIMA 11.4 73 44 15ARIMA 12.8 146 88 10

BCpOct89 AARIMA 13.74 234 211 18ARIMA 15.13 467 421 15

Star Wars AARIMA 8.24 15 9 16ARIMA 9.67 30 19 13

all the traces the AARIMA model shows an improvement inall the criteria used to estimate goodness of fit. Where,• AIC is the Akaike Information Criterion.• SER is the Standard Error of the Regression.• MAE is the Mean Absolute Error.

We measured the number of iterations to convergence, butthere was no evident pattern. Another method used to comparethe ARIMA and AARIMA models is the scatter plot. In figure1 the scatter plot of the measured MPEG-1 data against thesimulated ARIMA and AARIMA series is plotted.

A. Comparison Of Hurst Parameters of Measured Traces andModels

The Hurst parameters for the AARIMA, ARIMA modelswere compared with Hurst parameters for the observed seriesand the bias in the value of Hurst parameter was obtained. Thetool used to measure the self-similarity parameter is calledSelfis and is freely available [40]. The following estimatorswere used: Aggregate Variance method, Whittle estimator andVariance of Residuals method. It was noticed that the differentestimators gave different values of H for the same series. TableIII gives the value of H parameter for the Public Domaintraces.

To compare the accuracy of the AARIMA and the ARIMAmodels, we obtained the bias of the Hurst parameter for eachmodel above or below the Hurst parameter for the measuredseries using each estimator. In Table IV the bias for eachestimator was calculated for the Public Domain traces. Thefollowing results were obtained:

1) The AARIMA model gave values of H that were closestto value of the H parameter of the measured series forall estimators.

2) The greatest bias was in Whittle method for the ARIMAmodels.

3) The length of the simulated time series we used was200000 readings for the Bellcore traces and 10000 forthe Star Wars trace.

4) The bias is expressed as a percentage because theobserved series H parameter is used as the threshold,this is an estimated value.

VI. CONCLUSION

In this paper we present a new model the Adjusted ARIMAmodel for modelling long-range dependant Internet traffic. TheAARIMA model is proposed as a quick and simple way tomodel Internet traffic at millisecond time scales by retaining all

Page 4: [IEEE AFRICON 2007 - Windhoek, South Africa (2007.10.26-2007.10.28)] AFRICON 2007 - An adjusted ARIMA model for internet traffic

Fig. 1. Scatter Plot of Observed Star Wars Trace against Simulated ARIMASeries and AARIMA Series

TABLE III

HURST PARAMETER FOR MEASURED PUBLIC DOMAIN INTERNET TIME

SERIES

Trace Aggregate Variance WhittleName Variance of Residuals

BCOct89Ext 0.933 1.123 0.800BCpOct89 0.683 0.690 0.686Star Wars 0.654 0.938 0.523

TABLE IV

COMPARING BIAS IN H PARAMETER FROM MEASURED PUBLIC DOMAIN

INTERNET TRAFFIC

Trace Model Aggregate Variance Whittle%Name Variance% of Residuals%

BCOct89Ext AARIMA 0 0 0ARIMA 1 7 14

BCpOct89 AARIMA 6 9 29ARIMA 12 17 46

Star Wars AARIMA 13 1 11ARIMA 22 2 38

the properties of the ARIMA models while capturing the self-similarity. We use the Box-Jenkins methodology as a framework for our modelling procedure. We construct our modelby building the best ARIMA model possible for a trace andthen adding our adjustment which is an addition of a feedbackterm to obtain the equivalent AARIMA model. The modeldisplayed great accuracy for millisecond time scales.

Four H parameter estimators: Periodogram method, Ag-gregate Variance method, Whittle estimator and the Varianceof Residuals method were used to find the H parameters ofthe AARIMA and ARIMA modelled traces and the measuredtraces.

For the Public Domain traces namely BC-Oct89Ext, BC-pOct89 and Star Wars MPEG-1, AARIMA model gave valuesof H that were closest to value of the H parameter of themeasured series for all the estimators. The AARIMA modelwas proved to be robust with respect to all the four Hurstparameter estimators used. The Public domain traces werethree in number so we could not do extensive testing. Wehad to model keeping in mind the limitations of the availabletraces, so the length of the predicted traffic series for thedifferent data sets were different.

We obtained the number of iterations for the AARIMA andARIMA models of the different traces for all our data set, butwe could not reach any conclusive result. More research needsto be done to be able to establish the overhead of the approach.We need to obtain self-similar data such as hydrological datato validate our approach with data other than Internet data.

ACKNOWLEDGMENT

The authors would like to thank Hiba Mohamed Osmanfor helping in acquiring the data. The authors also thank Dr.Ahmed El Shiekh and Mr. Obiey Ahmed Elamin for helpfuldiscussions of the results of this paper.

REFERENCES

[1] W. Willinger, V. Paxson, and M. S. Taqqu, “Self-similarity and heavytails: Structural modelling of network traffic,” in A Practical Guideto Heavy Tails: Statistical Techniques for Analyzing Heavy TailedDistributions, R. Adler, R. Feldman, and M. Taqqu, Eds. Boston:Birkhauser, 1998.

[2] W. E. Leland, W. Willinger, M. S. Taqqu, and D. V. Willson, “On theself similar nature of ethernet traffic,” in Proc. of the ACM SIGCOMM’93, San Francisco, CA., Sep 1993, pp. 183–193.

[3] M. S. T. W. E. Leland, W. Willinger and D. V. Willson, “On the selfsimilar nature of ethernet traffic extended version,” IEEE/ACM Trans.Networking, vol. 2, no. 1, pp. 1–15, 1994.

[4] M. E. Crovella and A. Bestavros, “Self-similarity in world wide webtraffic: Evidence and possible causes,” IEEE/ACM Transactions onNetworking, vol. 5, no. 6, pp. 835–846, 1997.

[5] M. Crovella and A. Bestavros, “Explaining world wide web traffic self-similarity,” Boston University, Tech. Rep. 1995-015, oct 1995.

[6] M. Crovella, M. Taqqu, and A. Bestavros, “Heavy-tailed probabilitydistributions in the world wide web,” in A Practical Guide To HeavyTails: Statistical Techniques And Applications, R. Adler, R. Feldman,and M. S. Taqqu, Eds. Boston, USA: Birkhauser, 1998.

[7] W. Willinger, V. Paxson, R. R. M. S., and Taqqu, “Long-range depen-dence and data network traffic,” in Theory and applications of long-range dependence, P. Doukhan, G. Oppenheim, and M. S. Taqqu, Eds.Boston, USA: Birkhauser, 2003.

Page 5: [IEEE AFRICON 2007 - Windhoek, South Africa (2007.10.26-2007.10.28)] AFRICON 2007 - An adjusted ARIMA model for internet traffic

[8] W. Willinger, M. S. Taqqu, R. Sherman, and D. V. Wilson, “Self-similarity through high-variability: statistical analysis of ethernet lantraffic at the source level,” IEEE/ACM Trans. on Networking, vol. 5,no. 1, pp. 71–86, 1997.

[9] M. S. Taqqu, V. Teverovsky, and W. Willinger, “Estimators for long-range dependence: an empirical study,” Fractals, vol. 3, no. 4, pp. 785–788, 1995.

[10] A. Popescu, “Traffic self-similarity,” in Proc. of IEEE Intl. Conf. onTelecommunications (ICT2001), Bucharest, Romania, June 2001.

[11] M. S. Taqqu and V. Teverovsky, “On estimating the intensity of long-range dependence in finite and infinite variance time series,” in Apractical guide to heavy tails: statistical techniques for analyzing heavytailed distributions, R. Adler, R. Feldman, and M. Taqqu, Eds. Boston,USA: Birkhauser, 1998.

[12] J. Beran, Statistics for Long-Memory Processes. New York: ChapmanHall, 1994.

[13] C. K. Peng, S. V. Buldyrev, S. Havlin, M. Simons, H. E. Stanley, andA. L. Goldberger, “Mosaic organization of dna nucleotides,” PhysicalReview, vol. 49, no. 2, pp. 1685–1689, 1994.

[14] P. Abry and D. Veitch, “Wavelet analysis of long-range dependenttraffic,” IEEE Trans. Inform. Theory, vol. 44, no. 1, pp. 1–15, 1998.

[15] D. Veitch and P. Abry, “A wavelet based joint estimator for theparameters of long-range dependence,” IEEE Trans. Inform. Theory,vol. 45, no. 3, pp. 878–897, 1999.

[16] P. Abry, P. Flandrin, M. S. Taqqu, and D. Veitch, “Self-similarityand long-range dependence through the wavelet lens,” in Theory andApplications of long-range dependence, P. Doukhan, G. Oppenheim,and M. S. Taqqu, Eds. Boston, USA: Birkhauser, 2003.

[17] D. Veitch, M. S. Taqqu, and P. Abry, “Meaningful mra initialisation fordiscrete time series,” Signal Processing, vol. 80, no. 11, 2000.

[18] P. Abry, D. Veitch, and P. Flandrin, “Long-range dependence: revisitingaggregation with wavelets,” Journal of Time Series Analysis, vol. 19,no. 3, pp. 253–266, 1998.

[19] P. Abry, P. Flandrin, M. S. Taqqu, and D. Veitch, “Wavelets forthe analysis, estimation and synthesis of scaling data,” in Self-similarNetwork Traffic Analysis and Performance Evaluation, K. Park andW. Willinger, Eds. Wiley, 2000.

[20] M. E. Crovella and A. Bestavros, “Traffic models in broadband net-works,” IEEE Commun. Mag., vol. 35, no. 7, pp. 82–89, 1997.

[21] C. You and K. Chandra, “Time series models for internet data traffic,”in Proc. 24th Conf. Local Computer Networks LCN ’99, 1999, pp. 164–171.

[22] K. Chandra, C. You, G. Olowoyeye, and C. Thompson, “Nonlinear time-series models of ethernet traffic,” Center for Advanced Computation andTelecommunication Department of Electrical Engineering University ofMassachusetts Lowell,” CACT Report, June 1998.

[23] N. Groschwitz and G. Polyzos, “A time series model of long-term trafficon the nsfnet backbone,” in Proc. IEEE Int. Conf. Commun., May 1994,pp. 1400–1404.

[24] S. Basu, A. Mukherjee, and S. Klivansky, “Time series models forinternet traffic,” in Proc. IEEE INFOCOM ’96, vol. 2, San Francisco,CA., Mar. 1996, pp. 611–620.

[25] K. Papagiannaki, N. Taft, Z. Zhang, and C. Diot, “Long-term forecastingof internet backbone traffic: observations and initial models,” in Proc.IEEE INFOCOM 2003, San Francisco, CA., Apr. 2003.

[26] J. Liu, Y. Shu, L. Zhang, F. Xue, and O. W. W. Yang, “Traffic modellingbased on farima models,” in Proc. IEEE Can. Conf. Elect. Comput. Eng.,Edmonton, Alberta, Canada, May 1999, pp. 162–167.

[27] Y. Shu, Z. Jin, L. Zhang, L. Wang, and O. W. W. Yang, “Trafficprediction using farima models,” in Proc. IEEE Int. Conf. Commun.,vol. 2, Vancouver, Canada, June 1999, pp. 891–895.

[28] B. Krithikaivasan, Y. Zeng, K. Deka, and D. Medhi, “Arch-based trafficforecasting and dynamic bandwidth provisioning for periodically mea-sured nonstationary traffic,” IEEE/ACM Transactions on Networking,Aug. 2007, to be published.

[29] H. Tong, Non-linear Time Series, a Dynamical System Approach, ser.Oxford Science Publications. Clarendon Press, 1990.

[30] S. Mallat, “A theory for multiresolution signal decomposition: thewavelet representation,” IEEE Trans. Pattern Anal. Machine Intell.,vol. 7, pp. 674–693, July 1989.

[31] I. Daubechies, “Ten lectures in wavelets,” in Cbms-Nsf Regional Confer-ence In Applied Mathematics, vol. 61, SIAM, Philadelphia, May 1992.

[32] J. Johnston and J. DiNardo, Econometric methods, 4th ed. Singapore:McGraw-Hill, 1997.

[33] G. E. Box and M. G. Jenkins, Time series analysis forecasting andcontrol, 2nd ed. San Francisco: Holden-Day, 1976.

[34] P. J. Brockwell and R. A. Davis, Time series: theory and methods,2nd ed., ser. Springer Series in Statistics. New York: Springer-Verlag,1991.

[35] C. Chatfield, The analysis of time series: theory and practice. Chapmanand Hall, 1975.

[36] M. Kendall, Time Series, 2nd ed. London: Charles Griffen, 1976.[37] P. J. Diggle, Time series: a biostatistical introduction. Oxford: Oxford

University Press, 1990.[38] O. Rose, “Statistical properties of mpeg video traffic and their impact on

traffic modeling in atm systems,” University of Wuerzburg, Institute ofComputer Science Research Report Series, Tech. Rep. 101, Feb 1995.

[39] W. E. Leland and D. V. Wilson, “High time-resolution measurement andanalysis of lan traffic: implications for lan interconnection,” in Proc.IEEE INFOCOM 1991, Bal Harbour, FL, Apr. 1991, pp. 1360–1366.

[40] T. Karagiannis, M. Faloutsos, and M. Molle, “A user friendly self-similarity analysis tool,” ACM SIGCOMM Computer CommunicationsReview, vol. 33, no. 3, 2003.

Page 6: [IEEE AFRICON 2007 - Windhoek, South Africa (2007.10.26-2007.10.28)] AFRICON 2007 - An adjusted ARIMA model for internet traffic

Copyright Information

© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists,

or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.