a computational intelligence-based forecasting system for telecommunications time series

Engineering Applications of Artificial Intelligence 25 (2012) 200–206

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

0952-19

doi:10.1

n Corr

E-m

journal homepage: www.elsevier.com/locate/engappai

Brief paper

A computational intelligence-based forecasting system fortelecommunications time series

Paris Mastorocostas n, Constantinos Hilas

Department of Informatics & Communications, Technological Educational Institute of Serres, Terma Magnesias Street, 62124 Serres, Greece

a r t i c l e i n f o

Article history:

Received 6 December 2010

Received in revised form

25 February 2011

Accepted 6 April 2011Available online 25 September 2011

Keywords:

Dynamic TSK fuzzy neural system

Internal feedback

Telecommunications data

Non-linear time series forecasting

76/$ - see front matter & 2011 Elsevier Ltd. A

016/j.engappai.2011.04.004

esponding author. Tel.:þ30 2321049380; fax

ail address: [email protected] (P. Mastorocostas

a b s t r a c t

In this work a computational intelligence-based approach is proposed for forecasting outgoing

telephone calls in a University Campus. A modified Takagi–Sugeno–Kang fuzzy neural system is

presented, where the consequent parts of the fuzzy rules are neural networks with an internal

recurrence, thus introducing the dynamics to the overall system. The proposed model, entitled Locally

Recurrent Neurofuzzy Forecasting System (LR-NFFS), is compared to well-established forecasting

models, where its particular characteristics are highlighted.

& 2011 Elsevier Ltd. All rights reserved.

1. Introduction

The primary motive for telecommunications service provisionis profit. This is why charging and billing are vital in telecommu-nications business. However the aim of the telecommunicationsmanagers is not only the maximization of profit but also thereduction of unnecessary cost.

Making use of the historical data, telecommunications man-agers may predict the future demand by creating a reasonablyaccurate forecast of the call volume. The planning ventures, theinfrastructure investment and the call volume strategy are thetopics for which managerial decisions depend on the forecast.Forecasting is an integral input for network traffic management,infrastructure optimization and planning, and the scenario plan-ning process. To successfully manage their business, carriers mustrely on the data to monitor, analyze and optimize their systems inorder to map future trends and usage patterns.

In the case of large organizations, who are actually thecustomers of the telecommunications carriers, the managerialpolicy has to be based on the allotment of telephony services toemployees, primarily for use in normal business activities andsecondary, on a limited basis, for personal use. With the primarymission of providing cost effective voice communication services,managers must find and capitalize on opportunities for control-ling telecommunications costs.

ll rights reserved.

: þ30 2321049128.

).

A case of such an organization is a University Campus with morethan 6000 employees and 70,000 students, and an extended tele-communications infrastructure with more than 5500 telephoneterminals. Due to the continuous increase of the faculty membersand staff, new telephone numbers are added daily, and an increasingdemand for outgoing trunks exists. It is obvious that the changes incall volume are vital to the planning of future installations.

The University holds an extended database, made by the CallDetail Records (CDR) of the Private Branch Exchange (PBX), whichincludes information such as the call origin, the area code andexchange, and the duration of each telephone call. The database ismainly used to determine the total number, as well as the numberof the national, the international and the mobile calls peremployee per month. A forecast of future call volume will helpUniversity managers to take financial decisions and negotiate thetariffs with national service providers. It is noticed that the callclassification, into different categories, reveals certain and differ-ent patterns between destinations. Calls to national destinationscomprise almost half the volume of the total outgoing calls fromthe campus. On the other hand, calls to mobile destinations aresubject to higher tariffs and they demonstrate an increasing trendduring the last decade. This is an another evidence for the highpenetration rate of mobile services. In fact, the InternationalTelecommunication Union (ITU) reports a penetration rate of119.12% for Greece in 2009 (ITU-TICTeye, 2010).

In the past, the forecasting ability of several well-establishedstatistical methods on the University’s call traffic have beenstudied (Hilas et al., 2006). Linear models are also suggested forforecasting trends in telecommunications data by the ITU Recom-mendation E.507 (Madden and Joachim, 2007).

www.elsevier.com/locate/engappai

dx.doi.org/10.1016/j.engappai.2011.04.004

mailto:[email protected]

dx.doi.org/10.1016/j.engappai.2011.04.004

P. Mastorocostas, C. Hilas / Engineering Applications of Artificial Intelligence 25 (2012) 200–206 201

In this perspective, it would be very interesting to apply non-linear computational intelligence approaches on the same kind ofdata, in order to test their ability to forecast the outgoing callvolume of a telecommunications network. Therefore, a LocallyRecurrent NeuroFuzzy Forecasting System (LR-NFFS) is proposedin the present work, where its performance is compared withfamiliar forecasting approaches; namely a series of seasonallyadjusted linear extrapolation methods, Exponential SmoothingMethods and the SARIMA method. All comparisons are performedon real world data.

The rest of the paper is organized as follows: in Section 2, abrief presentation of the classical forecasting methods that arecompared with our proposed model is given. In Section 3, theLocally Recurrent NeuroFuzzy Forecasting System is presented.The training algorithm used to train the model is described inSection 4. In Section 5 the data used in the paper and the outcomeof the comparative analysis of the methods are presented, whileSection 6 hosts the concluding remarks.

2. Classical forecasting methods

In this section the time series analysis methods that were usedto compare and evaluate our proposed LR-NFFS method arebriefly presented.

First a simple method to forecast future values of the timeseries was used. This method, which is known as Naı̈ve Forecast 1(NF1) (Makridakis et al., 1998), takes the most recent observationas a forecast for the next time interval. After that, another simplemethod which takes into account the seasonal factors wasapplied. The method is somewhat different from the Naı̈veForecast 2 Method (NF2), which is also described in the paperreferenced above.

The procedure of NF2 is to remove seasonality from theoriginal data in order to obtain seasonally adjusted series. Oncethe seasonality has been removed, one can use the most recentseasonally adjusted value of the series as a forecast for the nextseasonally adjusted value. In contrast with the above procedure,we used the trend-cycle component to forecast the future valuesof the series by means of linear extrapolation. Then, the projectedtrend-cycle component was adjusted with the use of the identi-fied seasonal factors (Hilas et al., 2006). Thus, when multiplicativeseasonality is assumed we have the Linear Extrapolation withMultiplicative Seasonal Adjustment (LESA-M), while in the case ofadditive seasonality we have the LESA-ADD.

A familiar group of time series analysis methods are theexponential smoothing methods. In an exponential smoothing aparticular observation of the time series is expressed as aweighted sum of the previous observations. The weights for theprevious data values are terms of a geometric series and getsmaller as the observations move further into the past. SimpleExponential Smoothing (SES) applies to processes without trend.In order to accommodate linear trend, Holt (1957) modified thesimple exponential smoothing model).

Winters (1960) extended Holt’s method in order to cope withseasonal data. Multiplicative seasonal models (Winters MS) aswell as additive seasonal models (Winters AS) exist.

Although linear trend represents an improvement on simpleexponential smoothing, it cannot cope with more complex typesof trend. Other modifications of SES can be applied to time seriesthat exhibit damped trend. A damped trend refers to a regressioncomponent for the trend in the updating equation which isexpressed by means of a dampening factor. As before an expo-nential smoothing model with damped trend and additive sea-sonality (DAMP AS) and its multiplicative seasonality counterpart

(DAMP MS) exists. One may also try to fit a damped trend modelon time series with no seasonality (DAMP NoS).

For a comprehensive review on exponential smoothing meth-ods, readers are referred to the work of Gardner (1985). The aboveare popular in industry due to their simplicity and the accuracythat can be obtained with minimal effort in model identification.

Another familiar method to analyze stationary univariate timeseries data was developed by Box and Jenkins (1976). Themethod, which is called Auto Regressive Integrated MovingAverage method (ARIMA), presumes a weak stationarity, equallyspaced intervals or observations, and at least 30–50 observations.The seasonal ARIMA (SARIMA) also exists.

After fitting a time series model, one can evaluate it withforecast fit measures. The data set is usually divided into a‘‘training’’ set and a ‘‘validation’’ or ‘‘holdout’’ set. The trainingset is used to estimate any parameters and to initialize themethod. Forecasts are made for the validation set which wasnot used in the model fitting. The observer may subtract theforecast value from the corresponding measured value of thevalidation set data and obtain a measure of error or bias. This is agenuine measure of the forecasting ability of the model.

To evaluate the amount of forecast error, one may employ themean absolute error (MAE), the mean squared error (MSE), thesum of squared errors (SSE) for the whole forecast, and the rootmean squared error (RMSE). The aforementioned statistics mea-sure accuracy, but the sizes depend on the scale of data. So theydo not facilitate comparisons between methods, especially acrossdifferent time series and for different time intervals. A commonlyused statistic that deals with this problem is the mean absolutepercentage error (MAPE) (Makridakis et al., 1998).

Another statistic, which allows a relative comparison of formalmethods with naı̈ve approaches and also squares the errorsinvolved so that large errors are given much more weight thansmall errors, is Theil’s U-statistic:

U ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPn�1t ¼ 1 ðFPEtþ1�APEtþ1Þ

2Pn�1t ¼ 1 ðAPEtþ1Þ

2

vuut ð1Þ

where FPEtþ1 ¼ Ftþ1�Xt=Xt is the forecast relative error, andAPEtþ1 ¼ Xtþ1�Xt=Xt is the actual relative error.

3. Locally Recurrent Neurofuzzy Forecasting System (LR-NFFS)

The suggested Locally Recurrent Neurofuzzy Forecasting Sys-tem (LR-NFFS), for the case of an m-input–single-output system,comprises generalized Takagi–Sugeno–Kang (Takagi and Sugeno,1986) rules in the form

IF u1ðkÞ is A1 AND u2ðkÞ is A2 AND . . . AND umðkÞ is

Am THEN gðkÞ ¼ gðuðkÞÞ ð2Þ

where uðkÞ ¼ ½u1,u2,. . .,um�T is the input vector. The rule output

gðuðkÞÞ is implemented by a recurrent neural network in the formof 1�H�1, having a linear input layer while the hidden and outputlayers consist of neurons with internal feedback. In particular, theLR-NFFS has the following structural characteristics:

�
The premise and defuzzification parts are static, described by
mjðkÞ ¼ fmðuðkÞ; mjðkÞ, rjðkÞÞ ð3Þ

yðkÞ ¼ fyðm1ðkÞ,. . .,mrðkÞ,g1ðkÞ,. . .,grðkÞÞ ð4Þ

where r is the number of rules, mjðkÞ is the degree of fulfillmentof the j-th rule and mjðkÞ ¼ ½mj1ðkÞ,. . .,mjmðkÞ�

T , and rjðkÞ ¼

½sj1ðkÞ,. . .,rjmðkÞ�T are the parameter vectors of the Gaussian

u (

P. Mastorocostas, C. Hilas / Engineering Applications of Artificial Intelligence 25 (2012) 200–206202

membership functions. The degree of fulfillment is the alge-braic product of the corresponding membership functions:

mjðkÞ ¼Ymi ¼ 1

mAjiðuiðkÞÞ ¼

Ymi ¼ 1

exp �1

2

ðui�mjiÞ2

ðsjiÞ2

( )ð5Þ

�
The system’s overall output is derived via the weightedaverage defuzzification method, as given below:
yðkÞ ¼

Prj ¼ 1 mjðkÞUgjðkÞPr

j ¼ 1 mjðkÞð6Þ

�
The consequent parts of the fuzzy rules are dynamic. Theirstructural elements are neurons with local output feedback atthe hidden and output layers. Thus, dynamics is introduced tothe overall system through these feedback connections. No
Premise

Consequent

Defuzzification

Part

(static)

(dynamic)

k)

�1 (k)

�r (k)

..

.

..

.

..

.

..

.

(static)

Part

Part

y (k)g1 (k)

gr (k)

Fig. 1. General representation of the LR-NFFS.

u1 (k) um (k)

(4)j1w

(1)j11

w (1)j1m

w

(3)j1w

f1 (.)

z−1

z−1(2)j1Do

w

(2)j11

w

Oj1 (k)

gj (k

f2 (.)

(6)jw

Fig. 2. Consequent part o

feedback connections of the rule’s total output or connectionsamong neurons of the same layer exist.

The operation of the consequent part of the l-th fuzzy rule isdescribed by the following set of equations:

OjiðkÞ ¼ f1

Xm

l ¼ 1

ðwð1Þjil UulðkÞÞþXDo

l ¼ 1

ðwð2Þjil UOjiðk�lÞÞþwð3Þji

!,

i¼ 1,. . .,H, j¼ 1,. . .,r ð7aÞ

gjðkÞ ¼ f2

XH

i ¼ 1

ðwð4Þji OjiðkÞÞþXDg

l ¼ 1

ðwð5Þjl gjðk�lÞÞþwð6Þj

!ð7bÞ

where the following notations are used:

�

)

f th

f1, f2 are the neuron activation functions of the hidden and theoutput layers, respectively. In the following, the activationfunctions are both chosen to be the hyperbolic tangent tanhð:Þ.
� OjiðkÞ is the output of the i-th hidden neuron of the j-th fuzzy
rule, at time k.
� gjðkÞ is the output of the j-th fuzzy rule. � Do and Dg are the time lag orders of the local output feedback,
respectively, for the neurons of the hidden and the output layer,respectively.
� wð1Þjil , wð2Þjil are the synaptic weights at the hidden layer.
�
wð4Þji , wð5Þjl are the synaptic weights at the output layer.
�
wð3Þji , wð6Þj are bias terms for hidden neurons and the output
neuron, respectively.

The general representation of the LR-NFFS is shown in Fig. 1and the formation of the consequent part of the fuzzy rules isgiven in Fig. 2.

u1 (k) um (k)

(4)jHw

(1)jH1

w (1)jHm

w

(3)jH

w

f1 (.)

z−1

z−1(2)jHDo

w

(2)jH1

w

OjH (k)

z−1

z−1

(5)j1w

(5)jDg

w

e j-th fuzzy rule.


The selection of the aforementioned features is dictated by thefollowing:

�
The LR-NFFS preserves the local learning characteristics of theclassic TSK model, since it comprises fuzzily interconnectedsubsystems, which are local-recurrent-global-feedforwardneural networks (Tsoi and Back, 1994). The rules are notlinked with each other in time, neither through external norinternal feedback. They are connected merely via the defuzzi-fication part. The premise part performs the input spacepartition and the consequent part performs the input–outputmapping. Accordingly, each recurrent neural network is cap-able of tracking the dynamics of the internal states of theunknown system, in the input space’s domain that is set by therespective premise part. � The selection of the particular neuron as a structural unit is
based on the DFNN model (Mastorocostas and Theocharis,2002), where more complicated form, nevertheless sharing thesame underlying philosophy, was employed and exhibitedimproved identification characteristics compared to dynamicelements that have local synapse feedback. Additionally, thelocal output feedback contributes to the stability of theneuron’s response.

4. The training algorithm

4.1. The algorithm

The LR-NFFS is trained using the dynamic resilient back-propagation algorithm (D-RPROP) (Mastorocostas, 2004), whichconstitutes a modification of the standard RPROP method(Riedmiller and Braun, 1993), applicable to fuzzy models whoseconsequent parts are recurrent neural networks. Only minormodifications are made, such that the method takes into con-sideration the special features of the LR-NFFS, requiring calcula-tion of the error gradients for the feedback weights, as well as forthe parameters of the membership functions.

Let us consider a training data set of kf input–output pairs. TheMean Squared Error is selected as the error measure, where yd(k)is the desired output:

E¼1

kf

Xkf

k ¼ 1

ðyðkÞ�ydðkÞÞ2

ð8Þ

Considering @EðtÞ=@wi and @Eðt�1Þ=@wi to be the derivatives ofE with respect to a consequent weight wi at the present, t, and thepreceding, t�1, epochs, respectively, D-RPROP is described inpseudo-code as follows:

(a)
For all weights wi initialize the step sizes Dð1Þi ¼D0
Repeat
(b) For all weights wi compute the error gradient: @EðtÞ
@wi

(c)
For all weights wi, update step sizes:
If@EðtÞ

@wi

@Eðt�1Þ

@wi40

then DðtÞi ¼minfZþDðt�1Þi ,Dmaxg ð9Þ

Else if@EðtÞ

@wi

@Eðt�1Þ

@wio0 then DðtÞi ¼max Z�Dðt�1Þ

i ,Dmin

n oð10Þ

Else DðtÞi ¼Dðt�1Þi ð11Þ

date weights : DwiðtÞ ¼ �sign@EðtÞ

@wiDðtÞi ð12Þ
Up
� �

Until convergence

where the step sizes are bounded by Dmin, and Dmax in order toavoid overflow/underflow problems of floating point variables. Theincrease and attenuation factors are usually set nþ A ½1:01,1:2� andn�A ½0:75,0:99�, respectively. All four parameters are determinedusing the trial and error approach.

4.2. Extraction of the error gradients

Since the premise and defuzzification parts are static, thegradients of the error function with respect to the weights ofthe premise part are derived using the standard partial deriva-tives. For batch learning, mjið1Þ ¼ . . .¼mjiðkf Þ and sjið1Þ ¼ . . .¼ sjiðkf Þ. Thus, the error gradients of the premise part weightsare given by the following formulae:

@E

@mji¼Xkf

k ¼ 1

2

kfðyðkÞ�ydðkÞÞ

gjðkÞ�yðkÞPrl ¼ 1 mlðkÞ

mjðkÞuiðkÞ�mjiðkÞ

ðsjiðkÞÞ2

( )ð13aÞ

@E

@sji¼Xkf

k ¼ 1

2


gjðkÞ�yðkÞPrl ¼ 1 mlðkÞ

mjðkÞuiðkÞ�mjiðkÞ

ðsjiðkÞÞ3

( )ð13bÞ

The gradients of E with respect to the weights of the conse-quent part should be calculated using ordered partial derivatives(Piche, 1994), since there exist temporal relations through thefeedback connections. Calculation of the error gradients is basedon the use of Lagrange multipliers, as shown below:

@þE

@wð1Þjil

¼Xkf

k ¼ 1

flðOÞji ðkÞf01ðk,j,iÞujðkÞg ð14aÞ

@þE

@wð2Þjil

¼Xkf

k ¼ 1

flðOÞji ðkÞf01ðk,j,iÞOjiðk�lÞg ð14bÞ

@þE

@wð3Þji

¼Xkf

k ¼ 1

flðOÞji ðkÞf01ðk,j,iÞg ð14cÞ

@þE

@wð4Þji

¼Xkf

k ¼ 1

flðgÞj ðkÞf02ðk,jÞOjiðkÞg ð15aÞ

@þE

@wð5Þjl

¼Xkf

k ¼ 1

flðgÞj ðkÞf02ðk,jÞgjðk�lÞg ð15bÞ

@þE

@wð6Þj

¼Xkf

k ¼ 1

flðgÞj ðkÞf02ðk,jÞg ð15cÞ

with the Lagrange multipliers derived as follows:

lðgÞj ðkÞ ¼2


mjðkÞPni ¼ 1 miðkÞ

þXDg

l¼ 1

kþ lrkf

flðgÞj ðkþ lÞf 02ðkþ l,jÞwð5Þjl g

ð16aÞ

lðOÞji ðkÞ ¼XDo

l¼ 1

kþ lrkf

flðOÞji ðkþ lÞf 01ðkþ l,j,iÞwð2Þjil gþlðgÞj ðkÞf

02ðk,jÞwð4Þji ð16bÞ


where f 02ðkþ l,jÞ and f 01ðkþ l,j,iÞ are the derivative of gjðkþ lÞ andOjiðkþ lÞ, respectively, with respect to its arguments. Eqs. (16) arebackward difference equations that can be solved fork¼ kf�1,. . .,1 using the boundary conditions:

lðgÞj ðkf Þ ¼2

kfðyðkf Þ�ydðkf ÞÞ

mjðkf ÞPni ¼ 1 miðkf Þ

ð17aÞ

lðOÞji ðkf Þ ¼ lðgÞj ðkf Þf02ðkf ,jÞwð4Þji ð17bÞ

Fig. 3. Monthly number of outgoing calls. (a) Calls to national destinations and

(b) calls to mobile destinations.

Table 1Structural and learning characteristics of the LR-NFFS.

LR-NFFS structural characteristicsKind of calls Rules H Do Dg

National 3 5 2 0

Mobile 2 6 2 2

D-RPROP learning parametersnþ n� Dmin Dmax D0

Premise part

1.01 0.95 1E-4 0.30 0.02

Consequent part

1.10 0.80 1E-4 0.90 0.02

5. Simulation results

The data in hand cover a period of 10 years, January 1998–December 2007, and are the monthly outgoing calls to nationaland mobile destinations originating from the PBX of a largeorganization.

Due to the variation of days belonging in different months, i.e.February has 28 while January has 31 days, all data have beennormalized according to:

Wt ¼ Xt365:25=12

no of days in month tð18Þ

Each data set is divided into two subsets. The training set,which is used to estimate the parameters associated with eachmethod, and the validation set, which is used for the evaluation ofthe forecasts. The training set is chosen to be 9 years (108months) long and the validation set 1 year (12 months) long.

From the visual observation of both time series (Fig. 3a and b)a distinct seasonal pattern is noticeable which is made prevalentfrom the minimum that occurs in August. Apart from that, thenumber of calls to mobile destinations (Fig. 3b) shows anincreasing trend which comports with reports on mobile servicespenetration (ITU-TICTeye, 2010).

The parameters, which are estimated during the fitting proce-dure, are used to forecast future values of each series. Since thevalidation set is not used in model fitting, these forecasts aregenuine forecasts, and can be used to evaluate the forecastingability of each model. The forecasting accuracy can be evaluatedby means of the accuracy measures mentioned in Section 2.

The fuzzy models for the two categories of outgoing calls aredecided to be single-input–single-output, with the input beingthe number of the national/mobile calls of the previous month, inorder to investigate whether the models are able to discover thetemporal dependencies of this time-series through its recurrentstructure alone. Several LR-NFFSs with different structural char-acteristics are examined and various combinations of the learningparameters are tested. Selection of the model–parameter combi-nation is based on the criteria of (a) effective identification of thetime series and (b) moderate complexity of the resulting models.The selected structural characteristics of the LR-NFFSs are given inTable 1.

The training process is carried out in parallel mode and lastsfor 1500 epochs. The learning parameters of D-RPROP are hostedin Table 1. The input space is normalized to [�1,1] and the initialmembership functions are uniformly distributed. The initial andfinal membership functions for the case of national calls areshown in Fig. 4. It can be noticed that the final model efficientlycovers the input space, with the minimum degree of overlappingbetween two membership functions being around 0.2, while theykeep on preserving the local modeling approach of theTakagi–Sugeno–Kang fuzzy systems.

For each method, three holdout accuracy measures werecomputed. These are the RMSE, the MAPE, and the Theil’sU-statistic. The smaller value of each statistic indicates the better

fit of the method to the observed data. The results for each of thetwelve models are presented in Table 2; bold numbers indicatebest fit.

To further exploit the forecasting ability of the methods, plotsof the observed values for the validation set with the best fitmodel (LR-NFFS) and the second best fit model for each case aredrawn (Fig. 5). Also, 95% prediction intervals (PI) for the forecastsare presented in the plots. The prediction (or confidence) intervalswere estimated during the fitting process of the second best

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 4. Input space partition for the case of national calls. (a) Initial partition and

(b) final partition.

Table 2Comparative performance evaluation (testing data set) analysis.

Model National calls Mobile calls

RMSE MAPE Theil’s U RMSE MAPE Theil’s U

LR-NFFS 4367 12.118 0.301 5742 13.166 0.326NF1 8914 23.846 1.000 12,009 28.875 1.000

LESA-M 8570 24.391 0.722 9915 23.046 0.747

LESA-ADD 8418 24.798 0.713 10,271 27.218 0.699

SES 6748 20.943 0.515 9671 24.698 0.569

Holt’s linear 6753 27.552 0.506 11,191 35.507 0.663

Winter’s MS 7120 18.415 0.578 9114 20.475 0.665

Winter’s AS 6903 17.741 0.553 8495 21.875 0.573

Damped NoS 6862 21.422 0.512 11,962 31.756 0.715

Damped MS 7080 19.072 0.573 7419 15.958 0.524

Damped AS 7194 19.838 0.571 9020 23.584 0.599

SARIMA 6064 9.535 0.513 10,102 20.793 0.775

Fig. 5. Forecast fit with 95% PI and a comparison of LR-NFFS with the second best

fit method for each data set. (a) National calls and (b) mobile calls.


model and are denoted as Upper Confidence Level (UCL) andLower Confidence Level (LCL).

As regards the national time series (NAT) visual observation ofthe plot (Fig. 5a) reveals the differences between the two best-fitmodels. LR-NFFS gives better forecast, as it follows the evolutionof the series more closely, identifies the first local minimum thatappears in February, but misses the significance of the minimumin August. It should, also, be stressed that both forecasts fit wellwithin the 95% confidence intervals and would bear scrutiny witheven tighter confidence.

As regards the mobile time series (MOB) it is also betterpredicted by the proposed LR-NFFS method (Fig. 5b). The second

best-fit model (indicated by all three statistics in Table 2) is adamped trend model with multiplicative seasonality (DampedMS). One may observe how the LR-NFFS forecast follows theactual data pattern against the second best model, which missesthe behavior of the first two months (January and February,2007). Moreover, the actual data are not even within the 95%prediction interval of the Damped MS forecast.

Interestingly, the second best-fit model (Damped MS) was thebest fit model indicated for the same type of calls in a pastanalysis (Hilas et al., 2006) and was attributed to ‘‘the high cost ofmobile calls, which refrains users from making many calls tomobile destinations and retards the upward tendency’’. With ourcurrent point of view an alternative reasoning for this dampedtrend may be the approach to a saturation point in mobilepenetration.

6. Conclusion

A novel locally recurrent neurofuzzy forecasting system hasbeen proposed. The LR-NFFS is based on the TSK fuzzy model,


with the consequent parts of the fuzzy rules consisting of smallrecurrent neural networks with local output internal feedback,thus preserving the local modeling characteristics of the classicfuzzy model.

The ability of the proposed system to forecast telecommunica-tion time series has been evaluated by applying it on real-worlddata. A comparative analysis with a series of well-establishedforecasting models has been conducted, highlighting the effi-ciency of the proposed forecaster. Moreover, a recent review offorecasting in operational research (Fildes et al., 2008) concludedthat the damped trend can ‘‘reasonably claim to be a benchmarkforecasting method for all others to beat’’ which was the casewith our LR-NFFS approach for the mobile data.

Acknowledgment

The authors wish to acknowledge financial support providedby the Research Committee of the Serres Institute of Educationand Technology under grant þSAT/IC/01122010-60/15.

References

Box, G.E.P., Jenkins, G.M., 1976. Time Series Analysis: Forecasting and Control, 2nded. Holden-Day, San Francisco.

Fildes, R, Nikolopoulos, K, Crone, S, Syntetos, A., 2008. Forecasting and operationalresearch: a review. Journal of the Operational Research Society 59, 1150–1172.

Gardner Jr., E.S., 1985. Exponential smoothing: the state of the art. Journal ofForecasting 4, 1–28.

Hilas, C.S., Goudos, S.K., Sahalos, J.N., 2006. Seasonal decomposition and forecast-ing of telecommunication data: a comparative case study. TechnologicalForecasting and Social Change 73, 495–509.

Holt, C.E., 1957. Forecasting Trends and Seasonals by Exponentially weightedMoving Averages. ONR Memorandum 52, Carnegie Institute of Technology,Pittsburg, USA. (Reprinted in 2004: International Journal of Forecasting20, 5–10).

ITU-T ICTeye Reports, 2010. Available online : http://www.itu.int/ITU-D/ICTEYE/Reports.aspx [last accessed: December 2010].

Madden, G., Joachim, T., 2007. Forecasting telecommunications data with linearmodels. Telecommunications Policy 31, 31–44.

Makridakis, S.G., Wheelwright, S.C., McGee, V.E., 1998. Forecasting: Methods andApplications, 3rd ed. Wiley, New York.

Mastorocostas, P.A., 2004. Resilient back propagation learning algorithm forrecurrent fuzzy neural networks. IET Electronics Letters 40, 57–58.

Mastorocostas, P.A., Theocharis, J.B., 2002. A recurrent fuzzy neural model fordynamic system identification. IEEE Transactions on Systems, Man, andCybernetics—Part B 32, 176–190.

Piche, S.W., 1994. Steepest descent algorithms for neural network controllers andfilters. IEEE Transactions on Neural Networks 5, 198–212.

Riedmiller, M., Braun, H., 1993. A direct adaptive method for faster backpropaga-tion learning: The RPROP algorithm. Proceedings of IEEE International JointConference on Neural Networks, 586–591.

Takagi, T., Sugeno, M., 1986. Fuzzy identification of systems and its applications tomodeling and control. IEEE Transactions on Systems, Man, and Cybernetics 15,116–132.

Tsoi, A.C., Back, A.D., 1994. Locally recurrent globally feedforward networks: acritical review of architectures. IEEE Transactions on Neural Networks 5,229–239.

Winters, P.R., 1960. Forecasting sales by exponentially weighted moving averages.Management Science 6, 324–342.

http://www.itu.int/ITU-D/ICTEYE/Reports.aspx

http://www.itu.int/ITU-D/ICTEYE/Reports.aspx

a computational intelligence-based forecasting system for telecommunications time series

Documents