computational intelligence techniques for short-term electric load forecasting

Journal of Intelligent and Robotic Systems 31: 7–68, 2001.© 2001 Kluwer Academic Publishers. Printed in the Netherlands.

7

Computational Intelligence Techniques forShort-Term Electric Load Forecasting

SPYROS TZAFESTAS and ELPIDA TZAFESTASIntelligent Robotics and Automation Laboratory, Institute of Communication and ComputerSystems, Department of Electrical and Computer Engineering, National Technical Universityof Athens, Zographou 15773, Athens, Greece; e-mail: {tzafesta,brensham}@softlab.ece.ntua.gr

(Received: 18 March 2000; in final form: 4 January 2001)

Abstract. Electric load forecasting has received an increasing attention over the years by academicand industrial researchers and practitioners due to its major role for the effective and economicoperation of power utilities. The aim of this paper is to provide a collective unified survey studyon the application of computational intelligence (CI) model-free techniques to the short-term loadforecasting of electric power plants. All four classes of CI methodologies, namely neural networks(NNs), fuzzy logic (FL), genetic algorithms (GAs) and chaos are addressed. The paper starts withsome background material on model-based and knowledge-based forecasting methodologies reveal-ing a number of key issues. Then, the pure NN-based and FL-based forecasting methodologies arepresented in some detail. Next, the hybrid neurofuzzy forecasting methodology (ANFIS, GARIC andFuzzy ART variations), and three other hybrid CI methodologies (KB-NN, Chaos-FL, Neurofuzzy-GA) are reviewed. The paper ends with eight representative case studies, which show the relativemerits and performance that can be achieved by the various forecasting methodologies under alarge repertory of geographic, weather and other peculiar conditions. An overall evaluation of thestate-of-art of the field is provided in the conclusions.

Key words: short-term load forecasting, computational intelligence, neural networks, fuzzy logic,genetic algorithms, chaos.

1. Introduction

The field of forecasting was and still is one of the principal areas of scientific in-vestigation due to its numerous applications in real life throughout the years [7, 53,67]. A large variety of techniques have been developed (deterministic, stochastic)with relative advantages/disadvantages for each particular application. Thereforean effort was made to develop interactive human-computer systems that help theuser to select the best method for his (her) application, in particular regarding thelevel of accuracy achieved [41, 102].

The accuracy of short-term load forecast (STLF) in electric power systems hasa strong influence on their economic operation. Many important decisions dependheavily on this forecast, namely [23, 70]:

• scheduling of fuel purchases,

8 S. TZAFESTAS AND E. TZAFESTAS

• scheduling of power generation,• planning of energy transactions,• assessment of system safety.

Modern-life requirements force the electric power utilities to operate at thehighest possible efficiency which calls for very accurate forecasts. Large fore-casting errors may lead to either excessively risky or excessively conservativescheduling, which can in turn result in undesirable economic penalties. Higherforecasting values may force the start-up of too many power units and thereforehigher reserves than actually required, while lower forecasting values may lead toinability to provide the agreed power reserves. In both cases a higher operationalcost is incurred.

Typical targets (variables) under prediction in STLF include:

• half-hour ahead forecast,• one-hour ahead forecast,• 24-hour load forecast,• peak load forecast over 24-hours period,• peak load forecast over a 1-week period,• total daily energy consumption.

The electric load is related in complex and nonlinear fashions with variousfactors such as the time of the day, the day of the week, the season of the year,the climatic conditions, and the past usage patterns. Therefore several methods tomodel these relationships have been applied over the years such as time series, re-gression models, state space models, and others. These techniques which are calledmodel-based (conventional or classical) techniques will be briefly discussed in thenext section. An other class of techniques used for electric load forecasting is theclass of knowledge-based (KB) or expert system (ES) techniques. This approachwhich employs the knowledge and analogical reasoning of experienced humanoperators will be outlined in Section 3. The core of the paper (Sections 4–6) isdevoted to the third class of techniques, namely the class that uses neural networks,fuzzy inference and their combination including genetic algorithms (GAs).

The neural network-based forecasting methodology will be presented in Sec-tion 4. Here, a functional relationship between the climatic variables and the elec-tric load is not needed. This is so, because a neural network (NN) can intrinsicallygenerate this functional relationship by learning and training data. In other wordsthe nonlinear mapping (model) involved is implicitly embedded in the NN. Thismeans that the NNs can represent a load (pattern) and actually perform a patternrecognition function. This pattern depends on the training cases (inputs, examples)provided to the NN and so special care is needed for selecting the most appropriateinput cases. Other problems that have to be addressed in NN-based forecasting arethe selection of the NN-type (model) and structure, as well as the learning/trainingalgorithm. These issues will be discussed in Section 4.

SHORT-TERM ELECTRIC LOAD FORECASTING 9

The main drawback of NN-based forecasting is its inability to provide accurateforecasts on weekends and public holidays, since for these special days there arenot available sufficient input cases to train the network.

Neural network-based methods cannot treat the underlying uncertainty and com-mon-sense knowledge usually employed by human experts to forecast load shapes.This capability is possessed by fuzzy-logic (FL)-based systems which implementthe common-sense reasoning through the use of fuzzy sets and fuzzy rules. Thefuzzy-logic forecasting methodology is outlined in Section 5. Combining NNs andFL one can enhance the good features of both and minimize the limitations of each.Indeed, the experiment has verified the superiority of neurofuzzy (NF) forecastersfor both the weekdays and the peculiar days (Sundays, holidays). This hybrid NFapproach will be discussed in Section 6.

Many other hybrid forecasting methods have been proposed such as KB-NN(neuro-expert), chaos-FL, neuro-genetic (NN-GA or NG) or fuzzy-genetic (FL-GA or FG) methods. A short discussion of these methods will be provided inSection 7. The above methods (Sections 4–7) are collectively known as intelligentforecasting methods or computational intelligence (CI) forecasting methods. Thepaper provides a set of case studies drawn from the literature which illustrate thecapabilities of the CI forecasting methods and their relative merits and limitations.

2. Model-Based Forecasting Methodologies

Model-based STLF methodologies employ two fundamental models, namely peakload models and load shape models [2, 6, 10, 13, 14, 20–22, 66, 73, 112]. Inthe first type of models the daily or weekly peak involves two components: aweather independent base load and a weather dependent part added to the baseload [2, 6, 21]. Load shape models represent the load by a discrete time seriesextended over the forecasting period, and are distinguished in static and dynamicmodels. The static models represent the load by a linear combination base time-functions such as sinusoids, exponentials, polynomials or eigenfunctions. Then theforecasting problem reduces to that of estimating the coefficients (parameters) ofthis linear representation (model). The estimation is performed via linear regressionor exponential smoothing methods using a set of recent load data. These time-series methods do not incorporate the climatic influence on the load and have foundlimited application due to the resulting low accuracy forecasts. The influence of theclimatic conditions, the recent load profile and of other relevant random phenom-ena is incorporated in the dynamic load shape models. Two kinds of models aretypically employed:

– ARMA : Autoregressive moving average model;– Stochastic state-space model.

The ARMA model consists of two basic load components, a deterministic and astochastic component. The first stands for the periodic part of the load shape, while


the second represents the deviation due to the weather random effects. The deter-ministic component is given by a time dependent periodic nonlinear function, andthe stochastic one is represented by an ARMA model if the underlying stochasticprocess is stationary with finite variance, or an ARIMA (autoregressive integratedmoving average) model if the underlying stochastic process is non-stationary. TheARIMA model can be transformed into a state-space model and conversely [57].If the state-space model is used, the load and weather forecasts (states) are updatedusing Kalman or extended Kalman filtering. Due to the recursive nature of theKalman filter, the new forecast can be found using the results from the previoushour rather than recomputing the effects over many past hours. Another problemwith the time series approach is numerical instability. One reason is that usu-ally the time series methods employ computationally cumbersome matrix orientedadaptive algorithms which in many cases may be unstable. The ARMA/ARIMA-based regression techniques use linear or piece-wise linear representations for theweather load functional relationships. These linear relationships are typically as-sumed without any justification. But the actual weather-load functional relationshipmay be nonlinear, and so the ARMA/ARIMA-based forecasting has a limited ac-curacy capacity. This problem is overcome by using KB forecasting and even betterby CI forecasting techniques as discussed in the following sections.

The basic issues that have to be addressed when developing a reliable model-based forecasting method are:

– Selection of the most relevant data and assurance that no erroneous or anom-alous data are used for the forecasting [101].

– Determination of the meteorological variables that have a strong impact (cor-relation) to electric loads (e.g., temperature, cloud cover, wind speed, humid-ity).

– Reliable feature extraction to capture the underlying dominant informationabout the load shapes.

– Assurance of accurate weather forecasting, which in some cases may be im-proved using the load forecasting model.

– Decision about the forecasting model which is to be used for each forecastingtarget, e.g., weekdays, special days, season of the year, etc.

– The requirement that the forecasting model must be able to extrapolate with areasonable accuracy during cold snaps, heat waves or pickup loads.

– The desire that the forecasting model must be able to adapt to the system’sthermal inertia and to face the growth of the load.

Again, all of these issues can be accommodated in KB, NN, FL methods andtheir combinations.


3. Knowledge-Based Forecasting Methodology

Knowledge-based (expert) systems (KBS/ES) are designed employing artificial in-telligence (AI) concepts and methods, and emulate (mimic) human performancepresenting a human-like action to the user. KBS are currently finding applica-tion to a large repertory of human-life domains, including engineering and non-engineering domains, e.g., electric power systems, process control, robotic sys-tems, manufacturing systems, system fault detection/diagnosis, managerial sys-tems, finance and business systems, medical diagnosis, etc. [85, 97, 114, 116].Among them the electric load forecasting field has found a noticeable attention[27, 28, 49, 77, 78, 98].

The basic components of a KBS/ES are:

– A knowledge base (KB) which contains facts, rules, heuristics and proceduralknowledge (this is in addition to the normal data base (DB) which contains allthe numerical data of the application concerned).

– An inference engine (IE) consisting of reasoning/inference or problem solvingtechniques for using the knowledge to make decisions, and

– A user interface (UI) in natural language or interactive graphics or voice(speech) input form. The explanation generator of an ES provides answersto “how”, “why” and “what–if” queries made by the user.

Expert systems used in process forecasting and supervision are designed so asto possess capabilities like:

– Knowledge acquisition,– Intelligent data-interpretation,– Coping with process disturbances,– System state/output prediction,– Prediction of consequence of actions,– Economic optimization, etc.

In real-time applications a number of design considerations, beyond those whichare considered in standard ESs, are of great importance. Two of them are: the“dynamic nature of the domain facts” and the “large size of the knowledge base”required for a realistic implementation. In a large-scale industrial process (e.g.,power plant, refinery, etc.) the operator receives thousands of measurements andalarms, and the plant status can change significantly within a few minutes. Exhaus-tive search is therefore not possible in real-time. Thus the basic issues in designingan ES for this purpose are:

– Fast recognition of process conditions which are potentially significant,– Use of appropriate rule sets and focus on these problem areas for prognosis,

diagnosis, and procedural advice.

The four hierarchical layers (functions) of process management are:


– the direct interaction with the process,– the estimation/forecasting function,– the supervision function,– the executive production scheduling and operational management function.

The first layer, which is actually the interface between the process and thedecision and control units involves three components, namely: data/knowledgeacquisition, event monitoring, and direct control function which implements thepolicy selected by the supervisory layer. The second layer (called Items Generator)processes the data/knowledge provided by the first layer to estimate and predictthe state, the output and other variables of the process. The supervisory unit (layer)supervises the operation of the first (lower) layer controller by renewing the para-meters of the control law and of the monitoring functions, using the results of theitems generator. Finally, the fourth layer involves high-level management activitiessuch as production scheduling and operational management.

Let us now review how the above general concepts were applied in the STLFfield [27, 28, 49, 77, 78]. The KB technique was firstly applied in [77] where rulesfor load forecasting were derived using the historical relationship between the loadand dry-bulb temperature for a certain season, day type and hour of the day. Thegoal was to forecast the load in a one-to-twenty four hour time frame for a Virginiaelectric utility. Four sets of forecasts were prepared, one for each season, on thebasis of historical relationships between weather and load in Winter, Summer andAutumn. During the period of season change two forecasts were run (one for thecurrent season and the other for the upcoming season). The performance of theseforecasts were monitored and the better one was put on-line for viewing. The otherone continued to run in the background. The methodology of [77] involves thefollowing steps:

Step 1: Variable identification. To this end, the correlation analysis between thehistorical load and the weather parameters (dry bulb and wet bulb tempera-tures, relative humidity, wind direction, wind speed) was employed. For ex-ample, the result for Spring was the selection of the dry bulb temperature asthe weather parameter used for load forecasting. The other variables that werefound of interest for the load forecasting process are: the season, the seasonalload shape (i.e., the load shape impact), the day-of-the-week impact, and thechange �T in the dry bulb temperature.

Step 2: Relating variables and system load. From the interplay of weather andload shape impacts it was found that:

(i) the actual load value of the hour just elapsed would reflect the prevailingweather conditions,

(ii) the historical change in the load (�MW) for a given day type and seasonwould represent a typical load shape impact, and


(iii) the change in the hourly load (�MW) depends not only on the day typeand season, but also on the prevailing weather conditions.

Step 3: 6-hour forecasting. To this end, a six-step predictor is employed. Foreach forecasting step, the forecast variable is the change in load (�MW) fromhour λ − 1 to the forecast hour λ. The forecasted �MW is then added to theload (actual or forecasted) for hour λ − 1. The forecast �MW is based onthree sets of data points in history which were found empirically that bestrepresent the prevailing conditions on the current day (two days were foundto be insufficient), four days did not give any marginal benefit).

Step 4: 24-hour forecasting. The basic premise of the algorithm for the 24-hourforecast is essentially the same as for the 6-hour forecast, namely:

(i) the load shape impact is preserved via the judicious choice of data fromthe DB,

(ii) the influence of weather conditions dictated the rules for forecasting theload shape modification, and

(iii) the “inertia” (lag) in the load response suggested a 2-pass forecast.

In [27, 28] the standard practice of load forecasting at the Taiwan Power Com-pany (TPC) was employed for developing an expert system which is capable offorecasting the hourly loads. The system was based on a 5-year data base andimplemented in PROLOG with PASCAL subroutines for fast numerical compu-tation. This system starts with the identification of the day type and then proceedsto figuring out the daily peak load Lp and daily through load Lt. The result ofintensive interview with the expert operators and careful study of the 5-year hourlyload patterns is the identification of eleven distinct day types for the Taiwan powersystem. Because the load shape for some particular day type (e.g., weekday) mayvary slightly from season to season, the desired load patterns for this day are de-termined by averaging the normalized hourly loads on several recent days of thatday type. The normalization of the 24-hour loads of each day in the 5-year periodis performed using the following formula:

Ln(i) = [L(i) − Lt]/[Lp − Lt], i = 1, 2, . . . , 24,

where Ln(i) is the normalized load for hour i, L(i) is the load for hour i, Lt is thedaily through load, and Lp is the daily peak load. From this formula it follows thatthe hourly loads L(i) are given by

L(i) = Ln(i)(Lp − Lt) + Lt, i = 1, 2, . . . , 24,

where Lp and Lt must be available. From the detailed examination of the influ-ence of weather variables on the load, the following weather-sensitive model wasderived:

Lpj = ATpj + B,


where Lpj is the peak load on day j, Tpj is the equivalent high temperature ofthe system on day j , and A,B are parameters that must be estimated (via linearregression) using the hourly loads and weather data in the DB. Tpj is a weightedaverage of the high temperatures in the three areas of Taiwan (northern, central,southern) using load distribution factors (FN, FC, FS) of these areas as weightingfactors. The through load Ltj is predicted in the same way, except that, instead ofthe high temperatures, the low temperatures should be used. The identification ofday type is performed in a tree-like search form as dictated by PROLOG. A typicaldialog run is the following.

DIALOG.PLEASE ENTER THE DATE TO BE FORECASTEDMONTH (1–12): 3

DATE (1–31): 5

DAY (1–7): 6

IS THERE ANY SPECIAL EVENT (y/n)? yIS THERE A TYPHOON? nIS IT AROUND THE LUNAR YEAR (12,28–1, 6)? nIS IT A HOLIDAY? nIS IT AN EXTRA HOLIDAY? nIS IT A LUNAR FESTIVAL? nIS IT AN EXTRA WORKDAY? y

DAY TYPE IS 0

The mean absolute forecasting error achieved for a year by this ES was 2.52%,as compared to 3.86% obtained by the statistical method under the same condi-tions.

A considerably better forecasting accuracy (errors between 1.22% and 2.7%)was reported in [27] where the features of the statistical methodology with the KBmethodology were combined in conjunction with the so-called pairwise compar-ison (PC) technique [79, 81] which is appropriate whenever accurate models arenot easy to design. The PC technique prioritizes categorical variables and leads toa site-independent forecasting expert system. In STLF it works as follows. For agiven target hour in the future at which a load forecast is desired, there exists a setof selected similar records from the data bank (history). This set of records (namedsimilar set) is chosen from the history using the characteristics of the previoussection. Then the regression method is employed over the similar set. Actually,the PC method quantifies the forecasting categorical variables (e.g., hour of theday (hr), weekday (wd), sky cover (sc), etc.) which by nature cannot as such beused in the regression. In [27] KB rules were embedded in the PC algorithm toallow operator intervention and make the algorithm site-independent and capable


of facing changing conditions. Each day is divided in the following segments:night and inertia (9.00 p.m.–4.00 a.m. weekdays and week ends), morning peak(5.00 a.m.–8.00 a.m. on weekdays and 7.00 a.m.–11.00 a.m. on Saturdays andSundays), daytime hours (9.00 a.m.–12.00 noon and 1.00 p.m.–4.00 p.m.), andevening peak (5.00 p.m.–8.00 p.m.). These time segments which are selected torepresent the human factor and the weather factor, may be slightly different fromseason to season. The forecast is performed in a segment-by-segment fashion. Theforecasting method involves the following steps.

Step 1: Determine the criteria for selecting similar days and loads from the DB(on the basis of some knowledge about the day type of concern).

Step 2: Select the similar set using the above criteria (i.e., a limited set from thepast history).

Step 3: Adjust the selected set to accommodate any special feature involved (e.g.,annual growth, special hours of the week, base load of the day, and so on).

Step 4: Choose the categorical variable(s) and balance the similar set Hj, j =1, 2, . . . , m, i.e., assure that the average Vav of each variable V of Hj is equalto the target value Vo. To this end, use the adjustment tables of the site con-cerned. The mean loads over Hj of the balanced similar set are used for theL-values.

Step 5: Apply the PC technique to the L-values (if the similar set involves a suffi-cient number of points) and replace the categorical variables by their priorityvalues. Then, find the load forecast using statistical regression (which givesthe least-squares error estimate of the load).

The choice of the similar set is knowledge based, and the same knowledge is usedfor the adjustment of the selected data (Step 3). This forecasting algorithm reliesonly on the similar set which is chosen from history for every new target hour,and does not need any preselected model. It is therefore inherently updatable andadaptable to new information coming with time.

4. Neural Network-Based Forecasting Methodology

Neural networks (NNs) is the AI area [5, 24, 31] that has found the widest appli-cation in electric load forecasting since they do not rely on human experience (likethe KBSs) but attempt to draw by themselves a link between sets of inputs andsets of outputs [3, 9, 15, 29, 42–45, 50, 60, 69, 71, 72, 74, 96, 115]. NNs have thecapability to adapt to forecasting environments via self learning based on trainingexamples to provide the relations (functions, mappings) that link the underlyingvariables to the output, i.e., to the load forecast.


(a)

(b)Figure 1. Multilayer feedforward NN 3-layer MLP (one hidden layer), (b) The neuron model(a single node).

4.1. NN MODELS USED IN STLF

From among the various NN models the one that was used in almost all publishedworks is the multilayer feedforward network (MLP: multi-layer perceptron) withback propagation (BP) of the error. Sometimes, to improve the forecasting, thestandard MLP was modified by adding to the output a direct linear combinationof the inputs as shown in Figure 2 [71]. Another NN model used is Kohonen’sself-organized map (SOM) model (see the paper by Carpinteiro et al.� in this issue).

The general structure of a feedforward MLP is shown in Figure 1 which in-volves the input layer, the output layer, and several hidden layers of nodes. It is theinclusion of one or more hidden layers that makes the NN capable of approximat-ing nonlinear functions (mappings) or classifying patterns in nonlinear separableclasses [5, 24, 31].

The input–output equations of the kth neuron (Figure 1(b)) are:

yk = �(uk − θk), uk =n∑

j=1

wkjxj , (1)

� Carpinteiro, O. A. et al.: A hierarchical self-organizing map model in short-term load forecast-ing, J. Intell. Robotic Systems, Vol. 31, Nos. 1–3 (2001).


Figure 2. 3-layer MLP enhanced with an extra linear feedforward term.

where x1, x2, . . . , xn are the input elements (signals); uk is the output of the sum-mer; θk is a given threshold; wk1, wk2, . . . , wkn are the neuron’s synaptic weights;and �(.) is the neuron’s activation function which may have one of the sigmoidforms shown in Table I.

The output of the 3-layer MLP is equal to

y =N∑k=1

wkyk, (2)

where N is the number of hidden nodes, wk (k = 1, 2, . . . , N) are the synapticweights between the hidden nodes and the output node, and yk is the output of thekth hidden node given by (1):

yk = �

(n∑

j=1

wkjxj − θk

), (3)

where n is the number of input elements.The MLP enhanced with a linear feedforward function of the inputs has the

form shown in Figure 2.The output of this enhanced feedforward NN is given by

y =N∑k=1

wkyk +n∑

j=1

vjxj , (4)

where vj (j = 1, 2, . . . , n) are the corresponding input–output weights.


Table I. Types of NN’s sigmoid activation functions

Function’s name Function’s definition Pictorial representation

1 Threshold function(step function)

�(υ) ={

1, υ � 0

0, υ < 0

2 Signum function�(υ) =

1, υ > 0

0, υ = 0

−1, υ < 0

3 Piece-wise linear(saturation)function �(υ) =

−1, υ � 1

2

υ, −12 < υ < 1

2

1, υ � 12

4 Logistic function(monopolarsigmoid)

�(υ) = 11 + exp(−λυ)

5 Hyperbolic tangentfunction (bipolarsigmoid)

�(υ) = tanh(λυ2

)= 1 − exp(−λυ)

1 + exp(−λυ)


4.2. NN LEARNING

The most important characteristic of NNs is their ability to learn from their envi-ronment and improve their performance as the time passes. Actually, learning isthe process of updating the free parameters of the network through an enforcementapplied to them from the environment.

The general learning rule has the form:

wkj (n + 1) = wkj (n)+ �wkj (n), (5)

where �wkj (n) is the adjustment of the weight wkj (n) at the nth time instant. Thelearning algorithms differ in the way the weight’s adjustment is formed. Thus wehave: supervised learning (where the desired response is provided by a teacher),reinforcement learning (where the good performance is awarded and the bad per-formance is penalized via a critic (reinforcement) signal obtained from the NNsenvironment), and unsupervised learning (where neither a teacher nor a critic areavailable). The typical method for computing �wkj (n) is the gradient (steepestdescent) method, namely

�wkj (n) = −η∇WkjJ (n) = −η

ϑJ (n)

ϑwkj (n), (6)

where J (n) is a suitable error (cost) function, ∇wkjJ (n) = ϑJ (n)/ϑwkj(n) is the

gradient of J (n) along wkj (n), and η is a constant called the learning rate. Toensure that the partial derivative has finite values, the logistic or the hyperbolictangent sigmoid function are used as activation functions of the neural nodes.

If ydj (n) is the desired neuron output at time n in the jth iteration, and yj (n) thereal output obtained from the NN, then the error is given by

ej (n) = ydj (n)− y(n). (7a)

Then, the typical instantaneous cost (error) function used is given by

J (n) = 1

2

∑j∈C

e2j (n), (7b)

where C is the set of neurons in the network output. If m is the total number oftraining patterns (examples), then one can compute the mean squared error (MSE)as:

J = 1

m

m∑n=1

J (n). (8)

The computation of the adjustments �wkj (n) is performed via the back-propagation(BP) algorithm. The details can be found in [5, 24, 31, 115].

Two drawbacks of multi-layer neural networks are:


(i) the distributed form of the nonlinearity which together with the high connec-tivity makes difficult their theoretical study, and

(ii) the presence of hidden neurons which does not allow a clear and precisepicture of the learning process.

However, despite the above drawbacks, this type of networks has found wide ap-plicability and use in forecasting and other areas with well accepted results.

4.3. SELECTION OF TRAINING EXAMPLES FOR STLF

The accuracy provided by a NN depends (among others) on the proper selec-tion of the training examples, that must carry important information about theprocess which is to be approximated (estimated) by the NN. In electric STLF, threemethods employed for selecting the training examples are the following [71, 74]:

– Moving data window method, where the immediate two (or more) weeks ofdata are used for training, and the load is forecasted for the present week.

– 6-Day period method, where the training examples for a 6-day period in thesame time frame are selected, and the forecasts are performed daily.

– Similar load and temperature shapes method, where the days that have similarload and temperature values are selected from the past to train the NN. Thethird method is based on some suitable criterion that measures the degree ofsimilarity (distance) between past values and the present values of the load.

Suppose that xT = [x1, x2, . . . , xn] is the input vector and y the variable to beforecasted (here the electric load). Then the input–output vector (pattern, example)is XT = [xT, y]. Let the historical database consist of M potential training patterns{[x(k)T , y(k)], k = 1, 2, . . . ,M

}, and [xT

, y]be the daily input–output vector. Assuming that either M is very large or not all theexamples are useful for forecasting purposes, it is desired to select a subset of thedatabase so as the input conditions are similar (i.e., so as the difference betweenx(k) and x for this subset is very small). Clearly, since the NN is a map from x to y,

if x and x(k) (k = 1, 2, . . . ,M∗) are close enough, then the network output y willalso be close to the output y(k). The closeness of x and x(k) is measured by somekind of distance d(x, x(k)) between them. Three such distances (criteria) are thefollowing [71]:

1. d1(x, x(k)

) ={

n∑i=1

(xi

σi

)2}1/2

, (9a)

2. d2(x, x(k)

) ={

n∑i=1

[γixi

σi

]2}1/2

, (9b)


3. d3(x, x(k)

) =n∑

i=1

|γi| · |xi|, (9c)

where xi = x(k)i − xi , σi is an estimate of the standard deviation of the ith com-

ponent of the input vector, and γi = ϑy/ϑxi is the sensitivity of the NN’s outputy with respect to the input component xi . For the NN of Figure 1(a), where thelogistic activation function with λ = 1 is used, the sensitivity γi is given by (see(1)) :

γi =n∑

i=1

wi

wij

[1 + exp(−υi)]2· exp(−υi), υi = ui − θi . (10)

For the NN of Figure 2, γi is equal to

γi = vi +n∑

j=1

wj

ϑyi

ϑxi≈ vi, (11)

where vi is given in (4).A general technique for reducing the dimensionality of highly-dimensional

(n-dimensional) multivariate data is the principal component analysis (PCA) tech-nique in which the information is extracted by finding the directions in then-dimensional input data space along which the data elements possess the largestvariations [95].

4.4. DATA PREPARATION

To improve (and sometimes ensure) convergence, the data must be scaled or nor-malized such that to unify the very different ranges of the data originally collected.Three methods for this preparation of the input/output data used in the STLFliterature are the following:

• Scale each input (load, temperature) component and the output by dividing itsactual value by the respective value in the data set.

• Normalize the data between the maximum and minimum values, i.e.,

x(k)i,n = x

(k)i − xi,min

xi,max − xi,min, yn = y(k) − ymin

ymax − ymin, (12)

where x(k)i,n (yn) is the ith-input component (output) value after normalization,and xi,max (xi,min), ymax (ymin) the respective maximum (minimum) valuesover the data set (k = 1, 2, . . . ,M).

• Scale the data as

xi,n = x(k)i − xi

σxi, yn = y(k) − y

σy, (13)

where xi(y) is the average value of x(k)i (y), and σxi(σy) is an estimate of thestandard deviation of x(k)i (y).


4.5. FORECASTING ACCURACY CRITERIA

The quality (accuracy) of the forecasting is measured using several criteria. Themost frequently used error criteria are the following.

1. Mean Absolute Error (MAE).

MAE = 1

M

M∑i=1

∣∣actual(i) − forecast(i)∣∣, (14)

where M is the total number of data points, actual(i) is the ith actual value, andforecast(i) is the ith forecast.

2. Mean Absolute Percentage Error (MAPE).

MAPE = 1

M

M∑i=1

|actual(i) − forecast(i)|actual(i)

× 100%. (15)

3. Mean Squared Error (MSE).

MSE = 1

M

M∑i=1

[actual(i) − forecast(i)

]2. (16)

4. Relative Squared Error (RSE).

RSE =∑M

i=1[actual(i) − forecast(i)]2∑M−1i=0 [actual(i) − actual(i + 1)]2

. (17)

5. Relative Daily Error (RE).

RE = |actual(i) − forecast(i)|peak load of the day

. (18)

The MAE criterion penalizes all errors equally, whereas the MSE criterion pe-nalizes bigger errors more strongly. The MAPE criterion is the accepted industrystandard for measuring load forecast quality. The MAE criterion is typically usedfor temperature and humidity forecasts. In RSE, the denominator predicts the lastactual value which is the best that can be done.

4.6. GENERAL PROCEDURE FOR NN-BASED STLF

The general procedure that must be typically followed when using the NN-approachfor STLF involves the following actions:


I. Specify the variables (targets) to be predicted and identify the factors thathave a significant influence on the targets (typical targets were mentioned inthe introduction).

II. Select the structure of the NN to be used for each variable to be forecasted(number of layers, connectivity, number of nodes).

III. Collect (or assure) a sufficient amount of data for the training and test datasets.

IV. Specify the inputs to the NN. Potential input candidates are: time of the day,day of the week, season of the year, outdoor temperature (max, min, averagevalues), cloud cover, wind speed, lagged load values (at past points in time),weather related factors at past times, and weather forecasts.

V. Scale/normalize the collected data as described in Section 4.4. For the BPalgorithm it is only required that the input to each node lies in the interval[0, 1]. The normalization can be done for all the input channels together or foreach input channel separately or, finally, by considering groups of channels.

VI. Other issues that must be considered with regard to the NN include: the learn-ing rule, the activation function, the learning rate, the momentum factor, andthe NN environment. The number of hidden layers and number of nodes in thehidden layers are problem dependent and, usually, are selected empirically.Special care should be taken to avoid overtraining [100].

5. Fuzzy Logic-Based Forecasting Methodology

Fuzzy logic-based systems (or simply fuzzy logic systems) represent a class ofadaptive systems which like the neural networks do not need the knowledge of amathematical model of the process under study, and so they are not described bymathematical equations but by fuzzy (linguistic) rules. Fuzzy systems have twoimportant features:

– They allow the use of linguistic expressions to describe the behavior of thesystem at hand, and so they are able to imitate the actions/decisions of humanoperators/dispatchers.

– They are inherently nonlinear and so they are able to cope with nonlinearsituations of real practice.

The typical structure of a fuzzy logic (FL) system has the form of Figure 3 andconsists of four main units, namely:

– Fuzzification unit (FU),– Knowledge base (KB),– Fuzzy inference engine (FIE),– Defuzzification unit (DU).

The fuzzification unit carries out the following functions:

– Measures the values of the inputs,


Figure 3. General structure of a FL system.

– Maps the input ranges of values to proper universes of discourse (supersets),– Fuzzifies the incoming data, i.e., converts them to a suitable fuzzy form.

The KB unit involves a numeric data base (DB) section and a fuzzy (linguistic)rule-base (FRB) section. The DB section involves all the numeric informationavailable to perform the fuzzy reasoning. The FRB section involves the decisiongoals and strategies (usually provided by human experts) in fuzzy (linguistic) form.The fuzzy inference engine constitutes the heart of the FL system and containsthe required decision making logic (fuzzy reasoning, such as generalized modusponens rule, Zadeh’s max–min composition rule, etc.).

The defuzzification unit performs the following tasks:

– Maps the range of output variables into corresponding universes of discourse,– Defuzzifies the results of FIE, i.e., it converts them to non fuzzy (numeric)

form.

The rules involved in the FRB are usually obtained by interviewing expert op-erators and very rarely come out of mathematical analysis or computer simulation.The FRB rules are of the IF–THEN type. In general, the rules have many inputsand many outputs (MIMO), but it can be shown that any MIMO rule is equivalent(and can be converted) to a set of multi-input single-output (MISO) rules.

A standard fuzzy rule with two inputs and one output has the form:

Ri: IF x is Ai AND y is Bi THEN z is Ci

which involves two fuzzy premises (assumptions), namely “x is Ai” and “y is Bi”and one consequent (result), i.e., “z is Ci”, where Ai, Bi and Ci are fuzzy sets. Thewhole rule is represented by a fuzzy relation in the fuzzy Cartesian product

U × V × W, i.e., µRi= µ(Ai and Bi then Ci)(u,v,w),

where u ∈ U, v ∈ V and w ∈ W , and U,V,W are the respective universes ofdiscourse. The simplest way to calculate µRi is provided by Mamdani’s minimumrule, i.e.,

µRi= min

{µAi

(u), µBi(v), µCi

(w)}, (19)


where µAi(u), µBi

(v) and µCi(w) are the membership functions of the respective

fuzzy sets. If the FRB contains a total of n rules (Ri, i = 1, 2, . . . , n), then it canbe regarded as a unique relation R, where

R =n⋃

i=1

Ri. (20)

Suppose now that at a certain instant of time we have the fuzzy input values “xis A′” and “y is B ′” and we want to determine the result of applying these valuesto the rule base Ri (i = 1, 2, . . . , n). This can be found via Zadeh’s max–mincomposition rule “o” as:

(A′, B ′)on⋃

i=1

Ri =n⋃

i=1

(A′, B ′)oRi, (21)

where the max–min operator “o” is defined as:

B = AoR, µB(y) = maxx

{min

{µA(x), µR(x, y)

}}(22)

with µR(x, y) = min{µA(x), µB(y)}, and µA(x), µB(y) being the membershipfunctions of the fuzzy sets A ∈ U and B ∈ V .

The right-hand side of (21) suggests that instead of applying the premise (fact)(A′, B ′) to the knowledge base

⋃ni=1 Ri as a whole, one can apply (A′, B ′) to

each rule Ri of the KB separately. Typical forms of membership functions usedfor fuzzification are the triangular (symmetric), the trapezoidal and the bell-shapedfunctions (Figure 4).

The mathematical expressions for the first two membership functions are obvi-ous. The expression for the bell-shaped (non Gaussian) function is

µA(x) = 1

[1 + ((x − xo)/a)2] , a > o, x ∈ R. (23)

The typical methods for defuzzification are the following:

– maximum value (or height) method (HM),– mean of maxima (MOM) method,– center of gravity (COG) method.

Referring to Figure 5, the results of defuzzification (xdef) by each one of theabove methods are as follows:

HM: xdef = {x | max

iµA(xi)

} = x3. (24)

MOM: xdef = 1

2(x2 + x3) if the two maxima µA(x2) and µA(x3) are used,

(25a)

xdef = 1

3(x2 + x3 + x4) if three maxima are used (25b)


Figure 4. Typical membership functions: (a) triangular, (b) trapezoidal, (c) bell-shaped.


Figure 5. A typical membership function for the variable x with discrete valuesxi (i = 1, 2, . . . , 5).

COG: xdef =∑m

i=1 xiµA(xi)∑mi=1 µA(xi)

, m = 5. (26)

The method which is most frequently used is the COG method.An m-input/n-output FL system consists of n rules of the general type

IF x1 is Aj

1 AND x2 is Aj

2 . . . AND xm is Ajm THEN yj is Cj , (27)

where Aj

i (i = 1, 2, . . . , m; j = 1, 2, . . . , n) are fuzzy sets described by themembership functions µ

Aj

i(xi).

The number of rules required for achieving good results is reduced considerablyby using rules of the Takagi–Sugeno (T–S) type [89, 90, 93].

R(j): IF x1 is Aj

1 AND . . . AND xm is Ajm

THEN yj = cj

0 + cj

1x1 + · · · + cjmxm (j th rule; j = 1, 2, . . . , n),

(28)

where the output yj is a linear function of the inputs x1, x2, . . . , xm; cj

i are outputreal-valued coefficients, Aj

i are fuzzy sets with membership functions µAji(xi), and

the overall output is given by:

y =∑n

j=1 yj∏m

i=1 µAji(xi)∑n

j=1

∏mi=1 µA

ji(xi)

. (29)

The main problem in the above type of reasoning (which is called Takagi–Sugeno(T–S) reasoning) is the identification of the structure of the fuzzy model, i.e., theselection of the input variables and the proper input space partitioning.

This can be done by several techniques such as:

– experience,


– combinatorial methods,– fuzzy clustering,– orthogonal least squares (OLS),– adaptive resonance theory (ART),– other optimization methods (e.g., genetic methods).

Once the input space is partitioned, the premise parameters (i.e., the parametersof the membership functions) are determined. To each input space partitioning andeach rule j there corresponds a hypercell (or hypercube). The membership functionof the j th rule’s hypercell is given by

µj =m∏i=1

µj

Ai(xi). (30)

Therefore, using (30), the total output of the fuzzy system (Equation (29)) takesthe form:

y =∑n

j=1 yjµj∑nj=1 µj

. (31)

A special case of the general T–S rule (28) is to assume that

yj = cj

0 = wj . (32)

In this case we have the so-called simplified T–S rules. Due to the fact that the out-puts of the T–S rules are deterministic (crisp) functions of the inputs, the fuzzy rea-soning with T–S rules, i.e., the T–S fuzzy reasoning, is also known in the literatureas “functional reasoning”.

A STLF method based on the simplified T–S fuzzy reasoning was proposedin [68]. The membership functions used for the piecewise-linear approximation ofthe nonlinear output function (assuming for example a SISO function) have theshape shown in Figure 6 and are algebraically expressed as

µAij(xi) =

1

aij − ai(j−1)(xi − ai(j−1)), ai(j−1) � xi � aij ,

1 − 1ai(j+1) − aij

(xi − aij ), aij � xi � ai(j+1),

0, xi � ai(j−1), xi � ai(j+1),

(33)

where µAij (xi) is the value of µAij corresponding to xi (grade of xi), and aij thecenter of the membership function µAij

.It is easy to verify that for the membership functions (33) the denominator of

(31) is equal to unity, i.e.,

n∑j=1

µj =n∑

j=1

m∏i=1

µAji(xi) = 1 (34)


Figure 6. Takagi–Sugeno piecewise linear approximation of a single variable nonlinearfunction: (a) the nonlinear function, (b) the corresponding membership functions.

and so the output y of this simplified T–S system is given by (see (32))

y =n∑

j=1

wjµj . (35)

The performance of this functional reasoning system can be optimized in two ways,namely by first selecting the number and location of the input membership func-tions µ

j

Ai(xi), and then selecting the coefficients (weights) wj of the consequent

part (output) in some optimal way. This is called “optimal selection of the struc-ture of the FL system”. One way to select the number and position of the inputmembership function is to use the coding shown in Figure 7 [68].

This is done by first dividing the range of each input variable (x1 in Figure 7)in a certain number of sections (13 sections in Figure 7). Then we assign a “0”or “1” to all the edges of these sections (the two boundary edges of the rangeare exempted) as shown in Figure 7, i.e., placing the 1’s at the points where themembership functions will take their maximum values (centers) and the 0’s at allthe other points (where the membership functions will be increasing or decreasing).In the multi-input case, coding of each variable is organized serially. Then usingthis input variable coding, we minimize by simulated annealing the error (cost)function


Figure 7. Coding of the membership functions to be optimally selected.

J = 1

N

N∑l=1

(ydl − yl

)2, (36)

where yl is the lth actual input data set and ydl is the lth target data set. Theminimization of (36) is performed via the gradient rule (see (5) and (6)) which,taking into account that yl is given by (35), gives

�wk = η(yl − ydl

) · µk, (37)

where η is the learning rate.To evaluate the resulting FL system, the following index was used in [68]:

I = (J 0/Jmax

)+ λ · m, (38)

where J 0 is the optimal value of the error function (36) corresponding to the re-sulting optimal fuzzy structure (input membership functions and output coefficientswk), Jmax is the initial value of J (corresponding to the initial structure used to startthe iterations of (37)), λ is a parameter, and m is the number of the membershipfunctions in the optimal fuzzy structure. Clearly, a fuzzy structure is better not onlywhen J 0 is smaller, but also when m is small (i.e., the structure is simple).

6. Neurofuzzy Forecasting Methodology

To improve the performance of either the pure NN-based or the pure FL-basedforecasting, some researchers have applied several combinations of neural net-works and FL systems. The aim of this section is to review this hybrid neurofuzzyforecasting methodology [4, 8, 55, 61, 86, 94, 108, 110, 111].

Pure FL systems suffer from the lack of a specific method to determine mem-bership functions, and the lack of learning capability that can be overcome by NNsdriven by fuzzy reasoning. Neural networks are used in NF systems to tune themembership functions of the fuzzy variables involved, i.e., to adjust the fuzzy sets(linguistic labels) and determine the membership functions. FL can convert expert


Figure 8. The ANFIS architecture.

knowledge directly into fuzzy rules, but it takes a lot of time to design and adjustthe fuzzy sets (labels). NF systems possess improved performance and reducedtrial-and-error waste time (computational time). This is achieved via the learningof the NN.

Three classical neuro-fuzzy systems are:

– ANFIS: adaptive neuro-fuzzy inference system,– GARIC: generalized approximate reasoning-based intelligent control,– Fuzzy ART: fuzzy adaptive resonance theory system.

A brief review of them follows.

6.1. ANFIS

The architecture of this NF system is shown in Figure 8 and consists of five layers.The knowledge-base involves a set of K fuzzy rules of the Takagi–Sugeno type

(see also (28)):

Rj : IF x1 is Aj

1 AND . . . AND xN is Aj

N

THEN y = fj = pj1x1 + · · · + pjNxN + rj

for j = 1, 2, . . . , K.Every rule of the 1st layer corresponds to an available fuzzy set (Aj

i ) of theinputs. Its output gives the membership value µ

Aji(xi) of the input xi to the fuzzy

set Aj

i . For example, the bell type membership function


µAji(xi) = 1

1 + [((xi − cj

i )/aj

i )2]bji

(39)

involves three parameters (aji , bj

i , cj

i ) for each fuzzy set Aj

i .The number of nodes in the second layer is equal to the number of rules in the

rule bank. The output of each node gives the firing strength wj of the correspondingrule j as

wj =N∏i=1

µAji(xi). (40)

The third layer performs a normalization of the firing strengths of the secondlayer:

wj = wj∑Ks=1 ws

. (41)

In the fourth layer, wj is multiplied by the function fj of the consequent part ofthe rule, i.e.,

wjfj = wj(Pj1x1 + · · · + PjNxN + rj ). (42)

The fifth layer performs the defuzzification:

y =K∑j=1

wjfj =∑K

j=1 wjfj∑Kj=1 wj

. (43)

One can observe that ANFIS is actually a NN-like (5 layer) implementation of theT–S reasoning system (see (31)).

6.2. GARIC

The architecture of GARIC is shown in Figure 9. It consists of two networks, i.e.,the action selection network (ASN) which converts the input state to action (F),and the action evaluation network (AEN) which evaluates the current state x ofthe system (adaptive critic element) and provides the evaluation signal e(x). Then,the stochastic action modifier (SAM) generates the final action F ′ from F , and theinternal reinforcement signal r ′ = r − e(x), where r is the reinforcement signalprovided by the physical system.

Since the GARIC architecture is suitable only for controller design/implementa-tion it will not be discussed further here.

6.3. FUZZY ART

The fuzzy ART network is an unsupervised algorithm that receives a stream ofinput patterns and creates hypercells (recognition categories) [110, 111]. Therefore


Figure 9. The GARIC architecture.

it is suitable for performing the fuzzy-input space partitioning task. The inferencemodel used is the Takagi–Sugeno model. The parameters of the ART dynamicsare:

• a choice parameter α > 0,• a learning rate parameter β ∈ [0, 1],• a vigilance parameter ρ ∈ [0, 1].

The parameters (weights, heights) w ij = {wrij , r = 1, . . . ,M} of the input

membership functions µj

i ( x ) are adaptable parameters. The initial values of allweights are set to 1 (categories uncommitted). The values of all weights can onlydecrease with time. When an input is presented, a choice function CF is evaluatedfor each category. The category with the maximum CF is selected. If several cat-egories have the same CF value, the one with the smallest index is selected. Thechoice parameter α favors the categories that maximize |w ij | (smallest categories).After selection, a category is said to be committed. Resonance occurs when thefollowing criterion is satisfied

|I ij ∧w ij ||Iij | � ρ (vigilance criterion), (44)

where I ij is the current input vector {I rij ∈ [0, 1]}, and ∧ is the “minimum”operator: a ∧ b = min{a, b}. The weight updating rule is

wnewij = β(I ∧ wold

ij ) + (1 − β)woldij , (45)

where β is the index of the resonant category. Fast learning occurs for β = 1.In addition to the weights of the membership values, adaptive parameters are the


membership functions’ slopes and the coefficients ci of the consequence parts. Ateach iteration an input vector and an output vector are presented. Every P timesteps all committed nodes (rules) are checked and the one with the lowest localmean square error (lmse) is split provided that it has been activated at least P times.The best split is determined by an appropriate performance index, called tentativeperformance index (tpi). The output calculation is performed using Equation (31),by calculating the firing strengths (i.e., the values of the membership functions) µj

of all committed fuzzy rules, and activating the output of all fuzzy rules with firingstrengths higher than a threshold h (h = 0.01 in our experiments). The rule outputyj is computed as

yj = cj

0 + cj

1 x1 + · · · + cmxm, if µj > 0.001, (46)

where xi = (xi − mean)/width.The output error is E = y − yd . The mean squared error (MSE) and mean

absolute error (MAE) are updated as

MSEnew = 0.995 MSEold + 0.0005E2 and

MAEnew = 0.995 MAEold + 0.01.(47)

If E > 10 MAE an uncommitted rule (node) is used (added) and initiated asbefore. This rule addition process is activated very rarely. The input weights andmembership function slopes as well as the output weights are updated via the δ rule.Only the activated rules participate in the weight updating. Rule splitting and ruleadding, and associated updating take place as long as there are uncommitted rules,i.e., during the first stages of learning. This is also true for the fuzzy ART learningwhich was terminated in our experiments after 25000 time steps. It is remarked thata proper selection of P must be made.

6.4. A DEDICATED NF SYSTEM FOR STLF

The architecture of the NF system designed in [86] especially for STLF is shownin Figure 10.

In this system the following input variables were used: the daily maximum andminimum temperatures (max t,min t), the average temperature for the previousdate (pavt), the rain forecast (instead of percentage humidity), the distance of a dayfrom a special day (Sunday or a public holiday) and an input to reflect the seasonaltrends in the load (trend). The adopted fuzzification of these variables is shownin Figure 11. The actual daily load values were fuzzified via a linear membershipfunction

µA(x) = x − xmin

xmax − xmin.

The rules used are of the following form:

“IF x is low and y is high THEN z is medium”, (48)


Figure 10. Architecture of a dedicated NF STLF system.

Figure 11. Fuzzification of NF STLF input variables (all membership functions are symmet-rical for easy computation).

where x and y are the names for input variables, and z is an output variable.The membership (truth) value of the antecedent part of each rule was computed

via the “min” operator, while the membership function of the output variable z wascalculated using the “product” operator. For example, in the above rule we have

µ(A,B) = min{µA(low), µB(high)

}, (49a)

µc(z) = µ(A,B) ∗ µc(medium). (49b)

The defuzzification was performed by the COG method.The historical data was arranged into three separate categories, namely week-

day (Monday to Friday) data, Saturday data, and Sunday/public holiday data. Tocapture the shape of the load curve for these three day types, three similar 3-layerNNs were used with the logistic sigmoid function, 36 nodes in the hidden layer,and the BP/gradient weight updating (learning) rule. The NNs were trained by


Figure 12. Fuzzification of the change in load P .

feeding them a sequence of examples (patterns) received from the fuzzy front-endprocessor. The preparation of the training set was done as shown in Figure 10. Thechange in load (due to weather variations and other factors) was represented by afuzzy variable P , which was fuzzified as shown in Figure 12.

It was assumed that the influence of weather on load is the same during eachtime segment: night (12 p.m.–6 a.m.), morning (6 a.m.–12 noon), afternoon(12 noon–6 p.m.), and evening (6 p.m.–12 p.m.). Therefore each of the train-ing patterns was arranged as: [P (night), P (morning), P (afternoon), P (evening),target-load]. The numbers of neurons required to represent these variables were 5,5, 5, 5, and 24, respectively.

7. Other Hybrid SC-Based Forecasting Methodologies

In this section some other hybrid soft computing methodologies used or suitable tobe used for STLF will be briefly discussed. These include:

– hybrid KB and NN methodology,– hybrid chaos-based and FL methodology,– hybrid FL/NN and GA methodology.

7.1. HYBRID KB AND NN METHODOLOGY

Integrating expert systems with neural networks is a conceptually rich processwith many possibilities and alternative combinations. The two fundamental issuesto be considered are how to represent the knowledge within the model and howto conduct the reasoning. Some generic hybrid KB–NN configurations proposedin the literature are shown in Figure 13 [64]. The simplest structure is shown inFigure 13(a) where the output of the NN (or the ES) is fed to the ES (or the NN)in a sequential control mechanism. A second ES can be introduced for collectingthe input to the NN and/or analyzing the results as shown in Figure 13(b). In amore sophisticated configuration the NN (or NNs) may be embedded into the ES(Figure 13(c)) in which case the information learned by the NN is incorporatedwith facts and rules in the ES’s inference engine. An other possibility where the


Figure 13. Some possible ES/NN configurations: (a), (b) – sequential configurations; (c), (d)– more sophisticated configurations.

ES and NN can be combined with other decision making/information process-ing elements (such as decision support systems (DSS), genetic algorithm (GA)procedures, case-based reasoning (CBR), etc.) is shown in Figure 13(d).

The hybrid ES/NN configuration employed in [103] for forecasting purposesis of the sequential type and has the form shown in Figure 14. This system wasapplied to a gas consumption data base and gave a minimum MAE equal to 0.252with the best NN structure and the best training data set for a 6-month forecastingperiod.

The KB/ES provided the NN with data sets selected from a 4-year data bank(January 1998–December 1991) and tested for a 6-month period (January 1992–June 1992). The knowledge used was distinguished in two categories:

(i) knowledge used for data selection,(ii) knowledge used for data classification.

For the data selection process the rules that were used concern mainly thefollowing issues:

– The training data set size.– The periodicity of data and the existence of seasonal patterns.


Figure 14. Hybrid ES/NN system for industrial forecasting.

– The limits and the trends (increasing or decreasing) of training data in relationwith the limits of the forecasting input data.

– The evolution of the forecasting error in relation with the current training dataset.

– The dimension of the input array (number of NN inputs) in relation with theforecasting error.

– The size of the NN (number of hidden layers and neurons) in relation with thesize of training data set and the needed training epochs.

For the data classification process the following rules can be used:

– Rules for creating classes of data sets (e.g., limits of the class, increasing ordecreasing trends, input vector dimensionality)

– Frames and subframes for representing classes and subclasses of data setswith generic and more specific characteristics

– Rules for extracting the data set classes out of the frame knowledge structures.

The expert system finds a training data set (out of the entire data bank available)which gives the minimum absolute forecasting error for specific input patterns thatcorrespond to a number of consecutive forecasting periods. In this way a small sizetraining set is selected that gives the best results for the specific NN’s inputs. Thisdata set, the trained NN, and their corresponding forecasting inputs are stored andused again when a similar input pattern appears. During forecasting, the KB systemmonitors the forecasting error and using proper rules decides if it is necessary toretrain the NN with a new updated data set. If yes, the KB system updates thedata set and retrains the NN until a minimum error is achieved again, giving thepossibility to the NN to adjust itself to any changes of the process/environmentbehavior.


7.2. HYBRID CHAOS-BASED/FL METHODOLOGY

The classical approach to the study of almost all nonperiodic, complex and ir-regular phenomena is to consider them as indeterministic phenomena subordinateto randomness, and use random or stochastic processes (signals) theory. How-ever, physical phenomena of the above type can often be simulated (generated)using a differential or difference equation governed by determinism (i.e., the ini-tial conditions determine unambiguously the subsequent states). This is knownas deterministic chaos of a dynamical system. Typical chaotic time series gen-erated from nonlinear equations are the Lorenz series [59] and the logistic map[63]. Among the phenomena that possess a chaotic behavior we mention the heartbeat, the earthquake, the turbulent flow, the sunspot, the atmospheric temperaturechange, the neuron action and the change of electric power load.

Therefore the STLF can be studied using a chaotic approach. A hybrid STLFtechnique combining the chaotic time series representation of the electric load anda local fuzzy reconstruction method was proposed in [33–35]. The purpose of thissection is to provide a short account of this technique.

The STLF is achieved by reconstructing the observed load time series in an n-Dstate space using the Taken theorem of embedding [92]. The local reconstruction isbased on the use of the vector neighboring the data vector including the latest ob-served data. If the neighboring vector is linearly dependent, the STLF can be doneusing the Gram–Schmidt orthonormalization [38] and the tessellation method [65].However, for nonlinear dependence the above methods do not give accurate results.A good alternative is the local fuzzy reconstruction (LFR) method proposed in[33–35].

The basic steps of the chaotic method for STLF are the following:

Step 1: Observe the load time series at constant intervals: y(t), y(t − τ),y(t − 2τ), . . . , y(t − (n − 1)τ ), . . . .

Step 2: Establish the behavior of the time series. If it is chaotic it can be assumedthat it follows a certain deterministic law.

Step 3: Determine (estimate) the underlying deterministic law.Step 4: Forecast the data of the near future (until the deterministic causality is

lost) using observed data at a certain time point, since chaos has a sharpdependency on initial condition.

Step 5: Plot the data vector (including the latest time series data observed) in then-D reconstructed state space.

Step 6: Using the data vector neighboring the plotted data vector and the datavector at s time periods ahead, determine the target predicted value via theLFR method.

The Taken embedding theory may be summarized as follows. Consider a timeseries {y(t)} and an n-D observed vector [y(t), y(t − τ), . . . , y(t − (n − 1)τ )]T,where τ is the sampling step. This vector represents a point in R

n. A trajectory canbe plotted in the n-D reconstructed state space by changing t . Under the assump-


Figure 15. (a) Performance of X(i) in the original time series. (b) Dynamic transition fromX(i) to X(i + s).

tions that the target system is a deterministic dynamic system, and that the observedtime series is obtained via an observation system corresponding to a continuousmapping from R

n to R, the reconstructed trajectory is an embedding of the originaltrajectory if n is sufficiently large (n is known as the “embedding dimension”). Thecondition which guarantees the achievement of embedding is

n � 2m + 1, (50)

where m is the state space dimension of the original dynamical system. Note thatthis is a sufficient condition, and so in some cases embedding can be obtained witheven n < 2m + 1.

The state space and the attractor are reconstructed from the observed time seriesaccording to Taken’s theorem on embedding. Then based on the attractor a trajec-tory like the above is computed and nonlinear dynamics is presumed. Specifically,the time load series y(t) observed at constant sampling time periods is embeddedin an n-D state space with n and τ (sampling period) specified by Taken’s theorem.This reconstruction process gives X(t) = [y(t), y(t − τ), . . . , y(t − (n − 1)τ )]T.The data vector Z(T ) = [y(T ), y(T −τ), . . . , y(T − (n−1)τ ]T obtained from thelast observation is drawn in the n-D reconstructed state-space and the neighboringdata vector is replaced by X(i), where i ∈ N(Z(T )) with N(Z(T )) being the setof the index i of X(i) neighboring Z(T ). The behavior of X(i) in the original timeseries is similar to that of Z(T ) (see Figure 15(a)). This is the key feature.

Since X(i) is the past data, the state X(i+s) at the future time i+s (s samplingperiods ahead) is already known as illustrated in Figure 15(b). The predicted datavector value Z(T + s) of Z(T + s) is obtained from X(i + s). The predicted valuey(T +s) of y(T +s) is obtained from Z(T +s) using the local fuzzy reconstruction


method described below. The dynamics of the involved chaotic system is describedin rule form as:

IF Z(T ) = X(i) THEN Z(T + s) = X(i + s), (51)

where X(i) is the data vector neighboring Z(T ).This means that as long as the deterministic causality is preserved (s does not

exceed a certain value) the transition from Z(T ) to Z(T + s) is approximatelyequivalent to that from state X(i) to X(i + s). Clearly, under the assumption thatthe attractor embedded in the n-D reconstructed state-space is smoothly manifold,the trajectory from Z(T ) to Z(T + s) is influenced by the Euclidean distance fromZ(T ) to X(i).

Component-wise the chaotic dynamics rule takes the form:

IF ξj (T ) = yj (i) THEN ξj (T + s) = yj (i + s) (i = 1, 2, . . . , n), (52)

where ξj (T ) is the jth component of the X(i) = [y(i), y(i−τ), . . . , y(i−(n−1)τ ]T

value neighboring Z(T ) in the n-D reconstructed state-space, ξj (T + s) is the j thcomponent of X(i + s) = [y(i + s), y(i + s − τ), . . . , y(i + s − (n − 1)τ ]T, andn is the dimension of embedding.

Similarly, the trajectory from Z(T ) to Z(T + s) is influenced by the Euclid-ean distance from Z(T ) to X(i). Therefore to accommodate the nonlinear chaoticcharacteristics, the rule (52) can be replaced by the fuzzy rule:

IF ξj (T ) = yj (i) THEN ξj (T + s) = yj (i + s) (i = 1, 2, . . . , n), (53)

where for simplicity one can use δ-type (crisp) membership functions for yj (i) andyj (i + s).

Now, since Z(T ) = [y(T ), y(T − τ), . . . , y(T − (n − 1)τ )]T, the j th com-ponent of Z(T ) in the n-D reconstructed state space is given by yj (T ), and thej th component of the predicted value Z(T + s) of the data vector Z(T + s) s

sampling periods ahead is obtained from ξj(T + s) using the fuzzy implicationdictated by (53).

7.3. HYBRID FL/NN AND GA METHODOLOGY

The performance of FL-and NN-based methods depends on a number of certainstructure/design parameters which can be selected using suitable heuristics dictatedby the practical (or simulation) experience or using suitable optimization methods.The use of genetic algorithms (GAs) for this purpose is a promising alternative.The aim of this section is to provide an overview of this alternative.

7.3.1. Genetic Algorithms

Genetic algorithms, originated by Holland in 1975, are population-based optimiza-tion algorithms that emulate the biological evolution [19, 30, 107]. They behave


as a computational analog of adaptive systems using the principles of natural pop-ulation genetics to evolve solutions to problems. In the traditional formulation, arandom population of strings evolves through genetic steps of natural mechanism.Each genetic structure of the population (called chromosome) represents an individ-ual solution to the problem at hand. GAs belong to the general area of EvolutionaryComputation. GAs employ three basic genetic operators:

– Selection (e.g., survival of the fittest).– Crossover (i.e., recombination of the selected strings).– Mutation (which introduces randomness in the search process).

A GA performs a parallel recombinative random search (genetic search) throughthe following steps:

• Initialization (t = 0)

– Create a random population P(0) of parent strings.– Evaluate P(0).

• Iteration

– Select two parent strings P(t).– Combine these strings by applying crossover and mutation operators.– Evaluate the produced children by computing the fitness values.– t = t + 1.– Continue until a terminating condition is met.

GAs employ a coding of the parameter set; not the parameters themselves. Thebasic coding technique is to use a binary representation. Other coding methodsbased on alphabets and figures exist. However, the use of simplified coding ispreferable. The initial population (of appropriate size) is generated by randomnumbers using this coding. The control parameters of a GA are: the populationsize, the probability of mutation, the number of crossover points, etc. The practicalissues in GAs include: (i) the genetic representation scheme of the solution thatdetermines the chromosome structure, and (ii) the fitness function which evaluatesthe performance of each individual.

The main advantages of a GA search are:

– It does not use any gradient information and so it does not require continuityor convexity of the solution space.

– It has the capability to search the solution space by sampling several regionsof the space in parallel.

– It uses the information acquired from the solution space already explored.

Of course, the efficiency of a GA depends heavily on the bit string codingmethod used. Holland [30] has introduced the concept of “schemata” as the subsetof the set of strings which possess given values at specific positions.


7.3.2. Hybrid GA-FL Methods

A GA-based method for the simultaneous design of membership functions andfuzzy control rules has been provided in [117]. Assuming triangular member-ship functions, the left and right widths of them, the locations of their peaks,and the fuzzy control rules corresponding to every possible combination of theinput linguistic variables are the parameters to be optimized. These parametersare transformed into real-coded chromosomes (via a proportional scaling method).The offsprings over these chromosomes are produced by rank-based reproduction,convex crossover, and nonuniform mutation. The fuzzy rules considered are of theMISO type:

IF a is Ai AND b is Bi THEN c is Ci (i = 1, 2, . . . , n)

and the problem is to design the premise membership functions µAi(x), µBi

(y) andthe conclusions’ (rules’) membership functions µCi

simultaneously on the basis ofa selected criterion. Since all parameters to be determined are real, a real numberrepresentation was used in which each chromosome vector is coded as a vector ofreal numbers of the same length as the solution vector. The first block of genes xi ofeach chromosome x = [x1, x2, . . . , xN ] is used to determine the parameters of theAi membership functions, the next block of genes is used for the Bi membershipfunctions, and the third block of genes is used for the Ci membership functions.The crossover operator selected among the available ones is the convex crossoverdefined as

x ′1 = λx 1 + (1 − λ)x 2, x′

2 = λx 2 + (1 − λ)x 1

for the two real-coded chromosomes x1 and x2, where 0 < λ < 1. To avoid thepossibility that some offsprings may be worse than their parents and some fitterchromosomes are lost in the evolution process, the selection operator is applied toan enlarged sampling space that contains all the parents and offsprings and so all ofthem have the same chance to compete for survival. To increase the effectivenessof the GA, the chromosomes are selected proportionally to their ranks rather thantheir actual evaluation values. This implies that the fitness is an integer from 1to M, where M is the population size. The best chromosome has a fitness valueequal to N and the worst chromosome a fitness value equal to 1. The detailed stepsof the GA follow easily from the general steps defined previously.

The above GA-based optimization method concerns the case where both thepremise part and the conclusion part of the fuzzy rules are working with fuzzysets. A GA for optimizing FL systems with rules of the Takagi–Sugeno type (seeSection 5, (27)) was proposed in [83]. The technique was developed (without lossof generality) for the simplified T-S model with symmetric triangular membershipfunctions. The optimization criterion is

J = 1

2

N∑l=1

(ydl (xi) − yl(xi)

)2


and has to be minimized by selecting (tuning) the membership function parameters(center aij and widths bij ) and the output coefficients wj (see (32)). The codeadopted for the parameter vector (aij , bij , wj ) is the binary {0, 1} code. For eachbinary string (chromosome) so obtained the GA evaluates the optimization functionJ using the fitness function F = 1/J . The crossover operator is applied witha given probability (crossover rate) Pcross. The mutation operator is applied witha probability (mutation rate) Pmut, using a biased coin toss to determine if thisoperator must be carried out. The GA is terminated with a stopping rule based onthe maximum number of generations. The final solution is not necessarily the lastpopulation, but the best element among the successive populations. Some otherresults on optimally tuning FL systems and extracting fuzzy rules by using GAscan be found in [17, 25, 32, 36, 51, 52, 62, 75, 104, 113].

7.3.3. Hybrid GA-NN and GA-FL-NN Methods

GAs can also be used to provide good initial values for the weights of NNs or tocompletely train a NN. A general scheme for hybrid GA-NN systems is shown inFigure 16. On the top we have the GA. After the generation of the initial popula-tions, an individual with high fitness (i.e., small output error of the NN) is selectedto compete for evolution. The best individual is kept in each generation.

Figure 16. General scheme of hybrid GA-NN systems.


The genes of this individual are decoded to give the network weights. Then ,thegradient technique is employed for the BP algorithm in the NN learning. Severalvariations and implementations of this general GA-NN scheme, including the se-lection of the NN or FL system structure, can be found in [16, 37, 54, 56, 80].Particularly in [56] the problem of nonlinear dynamic modelling and identificationis solved using NNs and GAs on the basis of multi objective criteria. Three objec-tive (cost) functions are considered, namely the L2-norm (Euclidean distance), theL∞-norm (maximum difference) between the real system and its model, and thecomplexity measure of the nonlinear model. The optimization of this multi-criteriais performed using the method of inequalities, least squares and GAs. The GAsare also used for selecting the structure of the NN as part of the nonlinear modelselection. The Volterra polynomial basis function network and the Gaussian radialbasis function NN are used in this modelling scheme which seems to be a goodcandidate for application in STLF since it uses a NARMAX (nonlinear ARMAX)model:

y(t) = f(y(t − 1), y(t − 2), . . . , y(t − ny);

u(t − 1), u(t − 2), . . . , u(t − nu))+ e(t),

where f (·) is a nonlinear function, u is the input, y is the output, and e(t) is thenoise.

In [54] a NF algorithm based on the RBF (radial basis function) NN is tuned au-tomatically using GAs. A linear mapping technique is employed to encode the GAchromosome which consists of the centre and width of the membership functionsand also the weights of the NN. This scheme, which is called NFCGA algorithm,was applied to control a coupled-tank-liquid-level process and showed consider-able robustness and benefits. Since RBF networks possess an increased capabilityfor approximating complex nonlinear functions, this NFCGA algorithm is a goodpotential method for STLF and is currently under investigation by the authors’team. Of particular interest for STLF are also the hybrid CI methods proposed in[18, 39, 40, 76, 82, 84].

8. Case-Study Examples

To support most of the methodologies presented in Sections 3–7 we provide here arepresentative set of case-study examples selected from the references cited in thepaper.

8.1. EXAMPLE 1: NN-BASED FORECASTING OF PEAK DAILY LOAD, TOTAL

DAILY LOAD AND HOURLY LOAD

The NN topology used in this example [74] is simple since only one hidden layerand a small number of neurons are involved. The training patterns were acquiredfrom the Puget Sound Power and Light Company for the period 1st November


1988 through 30 January 1989. To evaluate the NN’s forecasting performance theMAPE criterion was applied (see (15)) applying five different sets of test data.

Daily peak load forecast. A NN with three input neurons, five hidden neuronsand one output neuron was used. The inputs are the average temperature, the peaktemperature and the lowest temperature at the day of prediction (k). The neuronoutput L(k) is the predicted peak load at day k. The resulting MAPE values for thenormal days (no holiday or weekends) are shown in Table II(a).

Daily total load forecast. The NN structure involves the same three inputs asbefore, five hidden neurons, and output the total load at day k. The MAPE resultsare shown in Table II(b).

Hourly load forecast. In this case one hour of lead time was examined with sixinput neurons, ten hidden neurons and output the hourly load forecast. The sixinputs are k, L(k − 2), L(k − 1), T (k − 2), T (k − 1) and Y (k) where k is the hourof predicted load, L(i) is the load at hour i, and Y (i) is the forecasted temperaturefor hour i. In the NN training, T (i) was used instead of Y (i). The lead times ofpredicted temperatures, Y (i), was between 16 and 40 hours. The MAPE results areshown in Table II(c).

The average values of MAPE for all the five test data sets are: 2.00% for thepeak load, 1.7% for the total load, and 1.4% for the hourly load forecast.

8.2. EXAMPLE 2: NN-BASED FORECASTING FOR THE 365 DAYS OF A 1-YEAR

PERIOD

Two NN topologies (see Figures 1(a) and (2)) and three criteria (distances) forselecting the input training patterns (see 9(a)–(c)) were used [71]. Therefore a totalof six alternative cases were considered. The problem of generating forecasts forall the 365 days of a year, on the basis of historical data constructed for the previousyear, was investigated. For this 1-year forecasting period and for each one of thesix alternative cases the MAPE index was computed for all the 365 days.

The results obtained are shown in Table III where the type I Network is theNN of Figure 1(a) and type II Network is the NN of Figure 3 with the extra linearfeedforward term.

The 90% percentile in Table III indicates that one 10% of the errors exceedthe corresponding MAPE value, thus providing a picture of how the tail of thedistribution of the errors behaves. One can see that type II NN gives much betterresults than type I NN, and also that the training selected by criterion 3 gives thebest performance when the type II NN is used. Figure 17 shows the plots of theactual and forecasted loads for this “type II NN-criterion 3” case. One can see thatthe forecasting accuracy is high for the whole period (21 October–24 November)except for the holidays at which the values of the error are between 10% and 40%.


Table II. MAPE values for daily peak load, dailytotal load, and hourly load forecast with one hourlead time

Days Set 1 Set 2 Set 3 Set 4 Set 5

(a) Peak load

Day 1 4.19 1.89 0.72 1.69 1.83

Day 2 0.24 1.85 3.03 0.31 3.25

Day 3 0.58 2.44 0.95 2.72 2.68

Day 4 2.39 3.85 3.29 2.84 1.10

Day 5 0.35 4.26 0.65 6.64 0.56

Day 6 2.81 0.13 0.63 1.40 2.04

Avg. 1.73 2.40 1.55 2.60 1.91

(b) Total load

Day 1 0.34 0.26 2.66 1.03 0.42

Day 2 1.02 1.99 1.82 0.70 0.92

Day 3 3.47 1.03 3.25 0.66 1.42

Day 4 1.63 1.73 5.64 1.89 2.11

Day 5 1.04 0.88 4.14 0.03 0.27

Day 6 1.77 1.10 2.96 1.20 1.05

Avg. 1.78 1.07 3.39 1.15 1.03

(c) Hourly load

Day 1 (*) 1.20 1.41 1.17 (*)

Day 2 1.67 1.48 (*) 1.58 2.18

Day 3 1.08 (*) 1.04 (*) 1.68

Day 4 1.40 1.34 1.42 1.20 1.73

Day 5 1.30 1.41 (*) 1.20 (*)

Day 6 (*) 1.51 1.29 1.68 0.98

Avg. 1.35 1.39 1.29 1.36 1.64

Removing the holidays from the training and forecasting data the results are muchbetter. Specifically, in the above type II NN-criterion 3 case the figures obtained are:MAPE = 2.3314%, standard deviation = 1.9420 and 90% percentile = 4.9740.

In overall, the average MAPE for four months of 120 days including weekdays,was found to be 1.66% which is comparable to the overall average MAPE = 1.7%obtained in the previous example.

8.3. EXAMPLE 3: NN-BASED FORECASTING BY THE ANNSTLF SYSTEM

This example was drawn from [45] and concerns a load forecasting system knownas ANNSTLF which by the year 1997 was in use by 32 utilities in the USA and


Figure 17. Forecasted load compared with actual load for the period October 21 throughNovember 24 (Day 32 is the Thanksgiving day).

Table III. Absolute percentage error statistics for thesix cases including holidays in forecasting and 1-yeartraining patterns

Criterion #1 #2 #3

Type I network

Average 4.1156 4.2295 4.2474

Standard deviation 4.3369 4.1963 4.5237

90th percentile 8.716 9.463 9.158

Type II network

Average 3.1224 3.0939 2.9525

Standard deviation 3.8537 3.9522 3.1943

90th percentile 6.171 6.338 5.828

Canada. Two versions of the system were produced, the second one employingfewer MLPs and offering increased forecasting accuracy.

The main features of ANNSTLF are:

• It runs on an IBM compatible PC under the MS-DOS, MS-Windows (3.1 and95) and NT operating systems.

• It possesses a Windows-based GUI with a rich set of data management, per-formance analysis and graphics tools.

• Two to three years hourly load and weather data is required for its training.• For on line operation it needs actual hourly load and weather data of the

previous day and hourly weather forecasts for future days.• It can provide hourly load forecasts for up to 35 days ahead.• It allows updating of forecasts on an hourly basis and reshaping/modification

of forecasts by the user.


Table IV. Inputs and Outputs of the three ANN STLF modules

Weekly module Daily module Hourly module

(all inputs at this hour)

Inputs 24 hourly loads: sameday, last week

24 hourly loads: previousday

Yesterday’s actual load

24 hourly temperatures:same day, last week

24 hourly temperatures:previous day

Yesterday’s actual temp.

24 hourly temperatureforecasts: this day

24 hourly temperatureforecasts: this day

Yesterday’s actual relativehumidity

Two days ago actual load

Two days ago actual

relative humidity

Temp. forecast

Relative humidity fore-cast

Day index (Sunday0.1 . . . Saturday 0.7)

Outputs 24 hourly load forecasts(this day)

24 hourly load forecasts(this day)

Load forecasts for thishour

• It is equipped with validation filters for data quality checking and a rich varietyof performance tracking and error analysis routines.

The basic element of ANNSTLF is the MLP which is trained by the BP algo-rithm with adaptive updating of the synaptic weights during on-line operation. Thefirst generation of the system has three MLP modules: an hourly forecast modulewith 9 inputs, a daily forecast module with 72 inputs, and a weekly forecast modulewith 72 inputs. The hourly module has 24 MLPs, and the daily and weekly moduleshave 7 MLPs each. The daily or weekly module has 24 outputs, i.e., the 24 hourlyload forecasts for the day concerned or for each day of the week concerned. Theinputs of these modules are given in Table IV. The final load forecast for each houris found by a proper adaptive combination (based on the recursive least squares) ofthe forecasts given for this hour by the three MLP modules.

The first generation ANNSTLF system had a forecasting accuracy better thanother similar systems, but was not suitable for on-site training on a PC platform dueto the large size of the MLPs. Also due to the adaptive combiner of the module’soutputs, the redundancy existing in the input data of several modules caused certainsmoothing phenomena resulting in some noticeable reduction in the forecastingaccuracy.

The second generation ANNSTLF was designed to address these issues andincludes only 24 small size MLP’s one for each hour of the day. On the basis of acareful study of the weather-load relations, it uses three types of forecast indicators,


Table V. Additional inputs for the four groups of hours per day

Group 1 Group 2 Group 3 Group 4

Hours 1–9 Hours 10–14/19–22 Hours 15–18 Hours 23–24

Average load andtemperature of latehours of previousday (i.e., the mostrecent information)

Eight inputs Eight inputs Four inputsForecast temp. of pre-

vious hours closeto this hour

Forecast temp. ofprevious and futurehours close to thishour

Forecast temperatu-res of the fourpreceding hours

Yesterday’s load andtemperatures ofhours close to thishour

Yesterday’s load andtemperature ofhours close to thishour

namely: past loads, past weather, and forecast weather for the coming day, and alsoa suitable grouping of the hours of the day, namely:

• Early morning: hours 1–9 (19 inputs).• Mid-morning, early afternoon, and early night: hours 10–14 and 19–22 (25 in-

puts).• Afternoon peak hours: hours 15–18 (25 inputs).• Late night: hours 23–24 (21 inputs).

The hours of each group are influenced by similar variables. Seventeen of the in-puts are the same for all groups (same hour load, temperature and relative humidityof one and two days ago; same hour load and temperature of seven days ago; samehour forecasts of temperature and relative humidity of the next day; seven binaryinputs one for each day of the week). The other inputs are as shown in Table V.

The special days are flagged during on-line operation and are treated as eithera Saturday or Sunday or a combination of them. On the basis of the above, thestructure of the second-generation ANNSTLF system is as shown in Figure 18.

Two weather forecasting modules were included in ANNSTLF, namely an hourlytemperature forecaster (TF) which consists of a single adaptive 28-input/24-outputMLP, and an hourly relative humidity forecaster (RHF) which is a non NN-basedRHF using a smoothed average of previous days to forecast the hourly values ofthe next day. A NN-based RHF was later incorporated in the ANNSTLF [43].

The performance of both ANNSTLF generations was tested by applying it toten utilities in various geographic regions using 3-year training data. For the loadforecast the MAPE measure was used, while for the TF and RHF the MAE indexwas employed (see (14) and (15)).

Some of the results obtained are shown in Tables VI–IX and Figures 19–21.For the 1-day ahead forecast an error below 3% is generally accepted by the

electric utility industry as quite satisfactory. The above results show that the second-generation ANNSTLF outperforms the first-generation system, and also that as theforecast horizon increases the errors do not increase very much.


Figure 18. Structure of 2nd-generation ANNSTLF.

Table VI. MAPE value of load forecasts for a six-month test using data from tenutilities

Forecast of Generation Days ahead

1 2 3 4 5 6 7

All Two 2.19 2.53 2.98 3.13 3.38 3.59 3.67

Hours One 2.52 2.87 3.21 3.53 3.73 3.88 4.00

Peak Two 2.48 2.87 3.03 3.23 3.39 3.52 3.66

Load One 2.98 3.25 3.58 3.89 3.91 4.02 4.12

Daily Two 1.70 2.39 2.47 2.64 2.68 2.72 2.82

Total One 2.26 2.88 3.02 3.05 3.10 3.16 3.19

8.4. EXAMPLE 4: OPTIMAL SIMPLIFIED T–S/FL-BASED 1-HOUR FORECASTING

This example presents the results obtained by using the simplified T–S fuzzy-reasoning-based STLF method [68]. This method when tested in approximatingthe nonlinear function

y = x1 − x22 + sin

(5

2πx3

)


Table VII. Samples of performance of the second–generation engine withon–line weather forecast inputs. The reported MAPE values are for all hours

Utility Period in 96 Days ahead

1 2 3 4 5 6 7

1 8/21–11/19 3.22 4.09 4.32 4.68 5.18 5.77 6.41

2 10/26–11/25 2.54 3.14 3.53 3.94 4.43 5.16 5.72

3 10/26–11/25 2.47 3.28 3.66 4.07 4.57 4.87 5.64

4 9/5–11/19 2.15 2.78 2.95 3.26 3.79 4.38 4.89

5 2/1–11/20 2.81 3.99 4.65 5.25 5.98 6.53 7.07

Table VIII. Average MAE values of thetemperature forecasts for ten utilities

Days ahead MAE (degrees F)

1 1.49

2 1.55

3 1.58

4 1.58

5 1.6

6 1.61

7 1.61

Table IX. Average MAE values of thehumidity forecasts for nine utilities

Days ahead MAE (%)

1 5.49

2 5.92

3 5.90

4 5.94

5 5.94

6 5.99

7 6.07

gave results of almost the same accuracy at the best case of a set of MLPs with5–15 hidden units. The best NN corresponds to 9 hidden units, a learning rateη = 0.1 and a momentum rate ρ = 0.9. The convergence criterion of the BP usedis 30.000 iterations. The method was applied using learning and test data preparedfor two months and one month, respectively (September, October; November) with1-hour sampling period. One step ahead prediction was made (1-hour forecast),


Figure 19. Actual and forecasted load for a seven-day period (solid line: actual, dashed line:forecast).

Figure 20. Example of actual and forecasted temperatures for one-to-seven days ahead (solidline: actual, dashed line: forecast).

using ten different initial conditions to obtain better models for the T–S methodand the compared NN-based method. The resulting optimal fuzzification of threevariables x1, x2 and x3 is as shown in Figure 22.

Table X gives the forecasting performance of the T–S method and the MLPNN method with 5–15 hidden units (HU). The average (Ave) and maximum (Max)errors as well as the standard deviation (SD) are shown for both the learning period(September and October) and the testing period (November).

In the learning data, non standard load patterns were included (due to holidaysand seasonal transition). The maximum error of the T–S method is 8.146% (on No-vember 24, a Monday holiday) which is smaller than the minimum of the maximumerrors provided by the various MLPs. Similarly the standard deviation SD = 29.12


Figure 21. Example of actual and forecasted humidities for one-to-seven days ahead (solidline: actual; dashed line: forecast).

Figure 22. Fuzzification of the variables x1, x2 and x3 provided by the method.

of the T–S method is always smaller than the SD given by the MLPs. Therefore theT–S FL method outperforms the NN method.

8.5. EXAMPLE 5: NEUROFUZZY STLF

The fuzzy ART method of Section 6.3 was tested using the well known Mackey–Glass benchmark time series [111]:

x(t) = 0.2x(t − τ)

1 + x10(t − τ)− 0.1x(t), x(0) = 1.2, τ = 17

which has the form shown in Figure 23.


Table X. Comparison of the errors (average, maximum) and the standard deviations of the T–S andthe NN methods

Methods Learning data (September and October) Test data (November)

Ave (%) Max (%) SD Ave (%) Max (%) SD

Proposed method 1.370 12.14 34.17 0.962 8.146 29.12

MLP HU = 5 1.464 11.89 35.65 1.056 9.449 30.93

HU = 6 1.382 12.02 34.61 0.935 9.081 28.22

HU = 7 1.358 11.95 34.20 0.954 9.195 28.28

HU = 8 1.403 11.92 35.17 1.046 9.447 30.52

HU = 9 1.394 11.85 35.86 1.053 9.522 31.17

HU = 10 1.435 11.94 34.55 1.000 9.242 29.72

HU = 11 1.382 12.05 29.55 0.994 9.369 29.91

HU = 12 1.425 11.91 34.94 1.040 9.483 30.56

HU = 13 1.389 11.92 34.70 1.024 9.413 30.35

HU = 14 1.352 11.93 33.75 0.962 9.217 28.75

HU = 15 1.385 11.84 34.61 1.046 9.394 30.88

Figure 23. The Mackey–Glass time series for 0 � t � 1200.

Using data in the interval from t = 24 to t = 1124, 1000 input–output pairswere obtained as

x(t − 24), x(t − 18), x(t − 12), x(t − 6); x(t),

i.e., the current value of x(t) is predicted using four past values. The first 500 datawere used for training and the second 500 data for testing. The results obtainedusing 12 and 16 fuzzy rules are as shown in Table XI.

Table XII shows the results obtained by other methods. One can see that onlyANFIS outperforms the fuzzy ART method.

This method was also used to forecast the final boiling point of Naphtha in agreek refinery, giving results much better (∼4–5 times smaller MAE errors) thanan MLP (26 inputs, 18 hidden neurons, 1 output) [108].


Table XI. Simulation results for the Mackey–Glass time series

Number of fuzzy rules RMSE training data RMSE testing data Iterations Time (in seconds)

1 N = 12 0.0015 0.0027 1200000 1245

2 N = 16 0.0015 0.0026 726000 845

The result is the average of ten runs for each case.

Stopping criterion: RMSE<0.0015 (on the training data).

Fuzzy implication: 5M(x)

M = 4, P = 500, α = 0.01, β = 0.01, ρ = 0, lrm = 0.0, lr2 = 0.5, lr = 0.25 (see [111]).

Table XII. Generalization results for various approaches to the Mackey–Glass time series predictionproblem. In all these cases, testing was performed on the 500 data points coming just after thetraining data points

Method Training cases Non-dimensional error index

Proposed method (12 fuzzy rules) 500 0.013

Proposed method (16 fuzzy rules) 500 0.012

ANFIS (16 fuzzy rules but each has feweradaptive parameters than the ones of the pro-posed approach)

500 0.007

AR model 500 0.19

Cascade-correlation NN 500 0.06

BP NN 500 0.02

Sixth-order polynomial 500 0.04

Linear predictive method 2000 0.55

The dedicated NF STLF method of [86] was applied to real load data of a 1-yearperiod (October 1991–September 1992). The MAPE values obtained for October1992 are shown in Figure 24 where the testing data includes Sundays and a publicholiday. The error values are higher on weekends and holidays compared to thenormal weekdays. This is shown in Figures 25(a–e) where the hourly forecastingresults for the five days included in the dotted period of Figure 24 (Friday 23October through Tuesday 27 October 1992) are depicted. From this figure one cansee that the NF MAPE error is about half the NN error especially during the holidayperiod. The holiday forecasting error can be reduced further more if special rulesfor a holiday from the historical data base are incorporated in the NF system.

8.6. EXAMPLE 6: CHAOS-BASED FL STLF

This forecasting method has been tested in several other applications, namely logis-tic map time series, Lorenz system, local weather prediction, stock price forecast-ing, prediction of demand for tap water, prediction of traffic density on expressway,


Figure 24. MAPE values for October 1992.

etc. [33–35]. In the field of electric load forecasting, it was applied to the case of anindependent power system, specifically to the system of an isolated island. Withoutgood load forecasts, the generators of the system are forced to start and stop morefrequently than necessary, which leads to increases in the maintenance cost.

The forecasting results are shown in Figure 25(a), (b), which were obtainedusing 1467 data points observed every hour. The first half of the observed timeseries data is used as the known initial value for predicting the time series s stepsahead. Then, the value of the next step is observed, according to which anothers steps ahead prediction is made. This process is repeated until all the observedvalues are employed. In the actual forecasting, 1000 data points were used as initialvalue in the first half of the time series.

The following parameters were used: embedding dimension n = 7, delay timet = 3, and number of neighboring data N = 3. The 1-step ahead forecasting gavea correlation coefficient r = 0.96 and an RMSE = 61.2.

8.7. EXAMPLE 7: HYBRID STATISTICAL/KB STLF

This example illustrates the results achieved by the combination of the statisticaland expert system approaches [27]. The algorithm described at the end of Sec-tion 3 was applied with parameters generated off-line using the available dataonly. All the other information required, namely daily time assignments, tempera-ture selection intervals and adjustment values were generated without any specificknowledge of the sites considered. All these parameters were kept the same for allsites, except of course for the adjustment values. Some representative results of thishybrid method for four sites over a 1-month period are shown in Tables XIII andXIV. More results can be found in [27].

8.8. EXAMPLE 8: HYBRID GA-BASED METHODS

GA-based methods have a very high potential for successful application in loadforecasting and forecasting of other processes with expected accuracy higher than


Figure 25. Forecasting results for a period of five days including a weekend and a Mondayholiday.

other methods. To give some evidence on this, a small set of GA-based resultsare given in this example. In [83] the standard GA of Goldberg [19] was appliedfor optimizing T–S FL systems as described in Sections 5 and 7.3.2. The fuzzyrule base was designed and optimized to approximate a discontinuous functiondescribed as follows:


Figure 26. Load forecasting for an isolated island’s system: (a) prediction at 1 step beforepower demand; (b) change of correlation coefficient by prediction.

y = −9x2 + 3x + 13 , 0 � x � 1

3 ,

y = 2x − 13 ,

13 < x � 1

2 ,

y = 0.030.03 + (4.5x − 3.85)2 ,

12 < x � 1.

This function which is depicted in Figure 27 possesses a discontinuity at x = 12 , a

discontinuity of the derivative at x = 13 , and a peak with a high curvature interval

around x = 0.85. In Figure 27 the approximation result achieved with 100 samples500 chromosomes, a mutation rate of 0.005, a crossover rate of 0.6, twenty rules,8 bits for coding the fuzzy parameters, and 400 generations is shown (55 min CPUtime on a Sun Sparc Station 10). One can see that a very good approximation of ywas obtained (minimum value of the quadratic error J in (54): Jopt = 0.065). How-ever, the linguistic interpretation of the resulting optimal membership functions israther difficult. In [83] an interesting study of the influence of the population size,the mutation rate, the crossover rate, the number of fuzzy rules and the parametercoding-length was also conducted. The general conclusions are the following:


Table XIII. Absolute forecasting errors for January weekdays

Hour Massachusetts Washington Virginia Florida

% error SD % error SD % error SD % error SD

0 1.57 1.03 1.88 1.03 2.10 1.03 2.17 1.01

1 1.41 1.03 1.61 1.02 2.29 1.04 1.81 1.02

2 1.61 1.02 1.45 1.03 2.69 1.08 1.59 1.01

3 1.71 1.02 1.42 1.06 2.47 1.03 1.74 1.02

4 1.83 1.03 1.40 1.02 2.06 1.01 1.66 1.06

5 1.65 1.02 2.13 1.05 1.98 1.02 3.21 1.21

6 1.83 1.01 1.80 1.07 2.12 1.07 3.33 1.25

7 1.75 1.02 2.18 1.01 2.31 1.03 3.23 1.24

8 1.51 1.04 2.18 1.05 2.28 1.06 2.82 1.12

9 1.62 1.07 1.44 1.01 2.29 1.03 2.75 1.17

10 1.65 1.09 1.40 1.07 2.24 1.03 2.67 1.01

11 2.06 1.13 1.78 1.09 2.52 1.04 2.50 1.09

12 1.95 1.09 2.14 1.03 2.80 1.06 2.79 1.13

13 1.92 1.09 1.89 1.07 2.80 1.06 2.32 1.10

14 2.18 1.10 2.26 1.03 2.92 1.05 2.41 1.04

15 2.53 1.09 2.58 1.12 2.13 1.02 2.81 1.03

16 2.50 1.07 3.19 1.39 1.77 1.02 3.02 1.07

17 1.72 1.06 2.60 1.37 1.83 1.15 2.69 1.19

18 2.25 1.20 1.97 1.37 1.77 1.03 3.15 1.26

19 2.79 1.22 2.15 1.19 1.99 1.10 3.24 1.19

20 1.99 1.20 2.41 1.35 2.81 1.24 2.90 1.18

21 1.83 1.07 1.81 1.29 2.05 1.11 2.43 1.07

22 1.77 1.04 1.50 1.22 2.01 1.04 1.67 1.04

23 1.60 1.03 1.47 1.20 1.83 1.08 1.76 1.07

avg 1.88 1.07 1.94 1.13 2.25 1.06 2.53 1.11

(i) Large populations (>1000 chromosomes) are not preferable than middle sizepopulations, since they give smaller errors but lower convergence speed.

(ii) The best error performance (0.002 � Jopt � 0.010) was achieved with amutation rate in the interval [0.001, 0.005] (the mutation rate was verified tobe a dominant factor for getting good results).

(iii) A good accuracy (Jopt � 0.005) was obtained with a crossover rate higherthan or equal to 0.6 (e.g., the result obtained with a crossover rate in theinterval [0, 0.3] is Jopt = 0.020). In general, however, the crossover rate hasnot a significant effect. Surprisingly, good results were obtained even withoutcrossover (i.e., only through mutations and selections).


Table XIV. Absolute forecasting errors for January weekends

Hour Massachusetts Washington Virginia Florida

% error SD % error SD % error SD % error SD

0 1.20 1.00 2.00 1.02 3.50 1.23 1.35 1.01

1 1.23 1.00 1.67 1.03 3.78 1.35 1.51 1.00

2 1.23 1.00 1.31 1.02 3.43 1.31 2.11 1.00

3 1.21 1.00 1.18 1.00 3.35 1.92 2.29 1.01

4 1.31 1.00 0.92 1.02 3.20 1.94 2.33 1.01

5 1.29 1.00 0.68 1.01 3.40 1.98 2.32 1.01

6 1.48 1.00 0.96 1.00 3.01 1.84 3.63 1.00

7 1.42 1.00 1.92 1.03 2.23 1.95 5.20 1.04

8 0.86 1.00 2.58 1.04 2.02 1.03 3.87 1.04

9 0.82 1.00 2.19 1.02 2.65 1.09 3.23 1.01

10 1.23 1.00 1.79 1.02 2.88 1.35 3.23 1.01

11 1.49 1.00 2.93 1.03 2.66 1.30 4.83 1.00

12 2.09 1.00 3.12 1.07 1.80 1.00 6.96 1.05

13 2.67 1.00 3.62 1.04 2.34 1.05 7.08 1.06

14 2.38 1.01 3.50 1.09 2.44 1.04 2.84 1.00

15 2.04 1.00 4.16 1.07 2.42 1.03 2.67 1.00

16 1.84 1.00 4.53 1.07 2.76 1.02 2.87 1.01

17 1.33 1.00 4.10 1.07 2.75 1.11 3.26 1.00

18 0.83 1.00 1.14 1.02 1.56 1.01 3.50 1.01

19 0.90 1.00 1.90 1.03 2.28 1.10 2.31 1.00

20 1.35 1.00 2.12 1.06 1.60 1.01 3.08 1.00

21 1.34 1.00 0.97 1.01 2.46 1.21 2.18 1.01

22 1.59 1.00 1.40 1.00 2.26 1.14 2.35 1.00

23 1.57 1.00 1.45 1.01 2.32 1.19 2.12 1.00

avg 1.45 1.00 2.17 1.03 2.63 1.30 3.21 1.01

(iv) A good final result was achieved with 20 rules (as a compromise). A similaraccuracy was obtained with more rules but with much lower convergencespeed.

(v) The parameter coding length proved to have a small effect on the result. Ac-tually the convergence speed does not go down as much with longer coding,and so one may expect that a longer optimization algorithm will give betterresults with longer parameter coding.

Similar results for the simplified T–S problem were reported in the independentwork of [51] (see Section 7.3.2.). Of particular interest are also the results reportedin [54]. In this study MIMO fuzzy rules of the form (18) were used with Gaussian


Figure 27. Results of approximation and associated optimum membership functions.

(bell-type) membership functions. Then, using radial basis function (RBF) NNswith Gaussian functions of the same type as those used for the membership func-tions in the fuzzy rule base, the RBF NN and the FL algorithm were integratedto form a class of NF reasoning algorithm. Choosing the centers and widths ofthe RBFs which constitute the fuzzy sets’ membership functions, the NN can beused to represent the fuzzy rule base, and so each fuzzy rule is processed in anRBF NN node. Here, the GA is used to optimally determine the centers and widthsof the Gaussian RBFs. In the experiments of [54] each of the input variables wasrepresented by five membership functions, and so a total of 20 parameters wereselected by the GA. These results can be easily imported in the STLF application.

9. Conclusions

This paper has presented a comprehensive overview of the application of compu-tational intelligence methodologies to the short-term load forecasting problem ofelectric utilities. This review demonstrates the high-level of maturity achieved inthis field, and the important results obtained by the practical application and use ofthese methodologies. CI involves NNs, FL, GAs, chaos and all possible combina-tions not only among them, but also with statistical/model-based and expert/know-ledge-based methods. The application of pure NN-based and FL-based methodshas been studied more extensively, but the hybrid (combined) methods have justrecently started receiving a great attention. Especially the use of GAs for optimiz-ing the critical parameters involved in the various forecasting methods is expectedto increase further the level of forecasting accuracy.


The case-study examples provided in the paper are by no means exhaustivebut representative, and serve to show the type of results and the level of accu-racy achieved in actual operating electric utilities. Globally comparing the meanabsolute percentage error of the daily load forecast obtained by the NN, the FL(T–S) and the NF methods (as presented in the examples of Section 8) one has thefollowing picture: (i) NN: 1.66–2.60% (average), (ii) FL(T–S): 0.962%, (iii) NF:0.58–1.26% (weekdays), 0.68–1.15% (special days and weekends), (iv) hybrid sta-tistical/KB: 1.88–2.53% (weekdays), 1.45–3.21% (weekends). The errors obtainedby the NN forecaster in the special days/weekends under the same conditions asthe NF forecaster are about double. One can see that the best performance can beachieved by the Takagi–Sugeno functional reasoning method or the NF methodespecially in peculiar days. However, by special care in selecting the training data,and use of GAs (whenever possible) some further improvement of the perfor-mance of all methods might be naturally expected. The authors’ group is currentlyinvestigating the application of GAs in the T–S and the fuzzy ART forecastingmodels.

Acknowledgement

This survey study has been conducted in the framework of the Brite-Euram IIIIFS Thematic Network: Intelligent Forecasting Systems for Refineries and PowerSystems (Project: BRRT-CT97-5023).

References

1. Abe, S. and Lan, M.-S.: Efficient methods for fuzzy rule extraction from numerical data, in:C. H. Chen (ed.), Fuzzy Logic and Neural Network Handbook, McGraw Hill, New York, 1996,pp. 7.1–7.33.

2. Asbury, C.: Weather load model for electric demand energy forecasting, IEEE Trans. PowerApparatus Systems 97 (1975), 1111–1116.

3. Bakirtzis, A. G. et al.: A neural network short term load forecasting model for the Greek powersystem, IEEE Trans. Power Systems 11 (1996), 858–863.

4. Bakirtzis, A. G. et al.: Short term load forecasting using fuzzy neural networks, IEEE Trans.Power Systems 10 (1995), 1518–1524.

5. Beale, R. and Jackson, T.: Neural Computing: An Introduction, Adam Hilger, Bris-tol/Philadelphia, 1990.

6. Bolzern, P. and Fronza, G.: Role of weather inputs in short-term forecasting of electric load,Electric Power Energy Systems 8(1) (1986), 42–46.

7. Box, G. E. P. and Jenkins, G. M.: Time Series Analysis Forecasting and Control, Holden Day,Oakland, OH, 1976.

8. Carpenter, G. A., Grossberg, S., and Rosen, D. B.: Fuzzy ART: Fast stable learning and cat-egorization of analog patterns by an adaptive resonance system, Neural Networks 4 (1991),759–771.

9. Chen, T., Yu, D., and Moghaddamjo, A.: Weather sensitive short term load forecasting usingnonfully connected artificial networks, IEEE Trans. Power Systems 7(3) (1992), 1098–1105.

10. Christiannse, W. R.: Short term load forecasting using general exponential smoothing, IEEETrans. Power Apparatus and Systems 90(2) (1971).


11. Commarmond, P. and Ringwood, J. V.: Quantitative fuzzy modeling of short-time-scale elec-tricity consumption, in: Proc. of Irish Signals and Systems Conf. (ISSC’97), Derry, June1997.

12. Dalianis, P., Kitsios, Y., and Tzafestas, S. G.: Graph colouring using fuzzy controlled neuralnetworks, Intell. Automat. Soft Computing 4(4) (1998), 273–288.

13. De Martino, B., Fusco, G., Mariani, E., Randino, R., and Ricci, P.: A medium and short-termload forecasting model for electric industry, in: IEEE Power Industry Computer ApplicationsConf. 1979, pp. 186–191.

14. Dehdashti, A. S.: Forecasting of hourly load by pattern recognition: A deterministic approach,IEEE Trans. Power Apparatus Systems 101 (1982), 3290–3294.

15. Dillon, T. S., Sesito, S., and Leung, S.: Short term load forecasting using an adaptive neuralnetwork, Electric Power Energy Systems 13(4) (1991), 186–192.

16. El Sharkawi, M.A. and Huang, S. J.: Application of genetic-based neural networks to powersystem static security assessment, in: Proc. of ISAP ’94, 1994, pp. 423–429.

17. Fahn, C.-S., Lan, K.-T., and Chern, Z.-B.: Fuzzy rules generation using new evolutionary algo-rithms combined with multilayer perceptions, IEEE Trans. Industr. Electronics 46(6) (1999),1103–1113.

18. Fonseca, C. M., Mendes, E. M., Fleming, P. J., and Billings, S. A.: Nonlinear model-termselection with genetic algorithms, in: Proc. of IEE/IEEE Workshop on Natural Algorithms forSignal Processing, 1993, pp. 27/1–27/8.

19. Goldberg, D. E.: Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA, 1989.

20. Gross, G. and Galiana, F. D.: Short-term load forecasting, Proc. IEEE, December 1987.21. Gupta, P. C.: A stochastic approach to peak power demand forecasting in Electric utility

systems, IEEE Trans. Power Apparatus Systems 90(3) (1971).22. Hagan, M. T. and Behr, S. M.: The time series approach to short term load forecasting, IEEE

Trans. Power Systems (1987), 785–791.23. Hara, K., Kimurs, M., and Honda, N.: A method for planning economic unit commitment and

maintenance of thermal power systems, IEEE Trans. Power Apparatus Systems 85(5) (1966),427–436.

24. Haykin, S.: Neural Networks: A Comprehensive Foundation, MacMillan, New York, 1994.25. Heider, H. and Drabe, T.: A cascaded genetic algorithm for improving fuzzy-system-design,

Internat. J. Approx. Reasoning 17 (1997), 351–368.26. Hirota, K.: Industrial Applications of Fuzzy Technology, Springer, Berlin/Tokyo, 1993.27. Ho, K.-L., Hsu, Y.-Y., Chen, C.-F., and Lee, T.-L.: Short term load forecasting of Taiwan power

system using a knowledge-based expert system, IEEE Power Engrg. Rev. (November 1990).28. Ho, K. L., Hsu, Y. Y., Chen, C.-F., Lee, T.-L., Liang, C. C., Lai, T. S., and Chen, K. K.: Short

term load forecasting of Taiwan power system using a knowledge-based expert system, IEEETrans. Power Systems 5(4) (1990), 1214–1221.

29. Ho, K.-L., Hsu, Y.-Y., and Yang, C.-C.: Short term load forecasting using a multi layer neuralnetwork with an adaptive learning algorithm, IEEE Trans. Power Systems 7(1) (1992), 141–149.

30. Holland, J. H.: Adaptation in Natural and Artificial Systems, The Univ. of Michigan Press,Michigan, 1975.

31. Hripsak, G.: Problem solving using neural networks, M.D. Computing 5(3), 25–37.32. Huang, Y.-P. and Huang, C.-H.: Real-valued genetic algorithms for fuzzy gray prediction

system, Fuzzy Sets Systems 87 (1997), 265–276.33. Iokibe, T., Fujimoto, Y., Kanke, M., and Suzuki, S.: Short-term prediction of chaotic time series

by local fuzzy reconstruction method, J. Intell. Fuzzy Systems 5 (1997), 3–21.34. Iokibe, T., Kanke, M., Fujimoto, Y., and Suzuki, S.: Short-term prediction on chaotic time

series by local fuzzy reconstruction method, in: Proc. of the 3rd Internat. Conf. on Fuzzy Logic,Neural Nets and Soft Computing, Iizuka, Japan, 1994, pp. 491–492.


35. Iokibe, T. and Mochizuki, N.: Short-term prediction based on fuzzy logic and chaos, in: Proc.of FAN Symp. 94, Tsukuba, Japan, 1994, pp. 77–82.

36. Ishibuchi, H. and Nakashima, T.: Improving the performance of fuzzy classifier systems forpattern classification problems with continuous attribute, IEEE Trans. Industr. Electronics 46(6)(1999), 1057–1068.

37. Jagielska, I., Matthews, C., and Whitfort, T.: An investigation into the application of neuralnetworks, fuzzy logic, genetic algorithms, and rough sets to automated knowledge acquisitionfor classification problems, Neurocomputing 24 (1999), 37–54.

38. Jimenez, J., Moreno, J., and Ruggeri, G.: Forecasting on chaotic time series: A local optimallinear reconstruction method, Phys. Rev. A 45 (1992), 3553–3558.

39. Kadirkamanathan, V.: Bayesian inference for basis function selection in nonlinear system iden-tification using genetic algorithms, in: J. Skilling and S. Sibisi (eds), Maximum Entropy andBayesian Methods, Kluwer, Boston/Dordrecht, 1995.

40. Katayama, R., Kajittani, Y., Kuwate, K., and Nishida, Y.: Self generating radial basis function asneuro-fuzzy model and its applications to nonlinear prediction of a chaotic time series, in: Proc.of the 2nd IEEE Internat. Conf. on Fuzzy Systems, Vol. 1, San Francisco, 1993, pp. 407–414.

41. Kengskool, K., Gross, M., and Martinez, R. M.: Forecasting techniques advisory system,Internat. J. Appl. Engrg. Education 6(2) (1990), 171–175.

42. Khotanzad, A., Abaye, A., and Maratukulam, D.: Forecasting power system peak loads by anadaptive neural network, in: C. H. Dagli et al. (eds), Intelligent Engineering Systems ThroughArtificial Neural Networks, Vol. 3, ASME Press, New York, 1993, pp. 891–896.

43. Khotanzad, A., Afkhami-Rohani, R., Lu, T.-L., Abaye, A., Davis, M., and Maratukulam, D. J.:ANNSTLF–A neural-network-based electric load forecasting system, IEEE Trans. NeuralNetworks 8(4) (1997), 835–846.

44. Khotanzad, A., Davis, M. H., Abaye, A., and Maratukulam, D. J.: An artificial neural-networkhourly temperature forecaster with applications in load forecasting, IEEE Trans. Power Systems11(3) (1996), 870–876.

45. Khotanzad, A., Hwang, R. C., Abaye, A., and Maratukulam, D.: An adaptive modular artificialneural network hourly load forecaster and its implementation at electric utilities, IEEE Trans.Power Systems 10(4) (1995) 1716–1722.

46. Kosko, B.: Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to MachineIntelligence, Prentice-Hall, Englewood Cliffs, NJ, 1991.

47. Kosko, B.: Fuzzy systems as universal approximators, IEEE Trans. Comput. 43(11) (1994),1329–1333.

48. Kosko, B.: Additive fuzzy systems: From function approximation to learning, in: C. H. Chen(ed.), Fuzzy Logic and Neural Network Handbook, McGraw-Hill, New York, 1996, pp. 9.1–9.22.

49. Lambert-Torres, G., Traore, C. O., Lagace, P. J., and Mukhedkar, D.: A knowledge engineeringtool for load forecasting, in: IEEE Symp. on Circuits and Systems, Calgary, Canada, 1990,pp. 144–147.

50. Lee, K. Y., Cha, T., and Park, J. H.: Short term load forecasting using an artificial neuralnetwork, IEEE Trans. Power Systems 7(1) (1992), 124–132.

51. Lee, M.-R.: Generating fuzzy rules by genetic method and its application, Internat. J. ArtificialIntell. Tools 7(4) (1998), 399–413.

52. Lekova, A., Mikhailov, L., Boyadjiev, D., and Nabout, A.: Redundant fuzzy rules exclusion bygenetic algorithms, Fuzzy Sets Systems 100 (1998), 235–243.

53. Lewis, C. D.: Forecasting in Operations Management and Practice, Philip Allan, Oxford, 1981.54. Lian, S. T., Marzuki, K., and Rubiyah, Y.: Tuning of a neurofuzzy controller by genetic algo-

rithms with an application to a coupled-tank liquid-level control system, Engrg. Appl. ArtificialIntell. 11 (1998), 517–529.


55. Lin, Y. and Cuningham III, G. A.: A new approach to fuzzy-neural system modeling, IEEETrans. Fuzzy Systems 1(1) (1993), 7–31.

56. Liu, G. P. and Kadirkamanathan, V.: Multiobjective criteria for neural network structure selec-tion and identification of nonlinear systems using genetic algorithms, IEE Proc. Control TheoryAppl. 146(5) (1999), 373–382.

57. Ljung, L. and Soderstrom, T.: Theory and Practice of Recursive Identification, MIT Press,Cambridge, MA, 1983.

58. Lonergan, T. and Ringwood, J. V.: Linguistic modeling of short-timescale electricity consump-tion using fuzzy modeling techniques, in: Proc. of Irish DSP and Control (ISPCC’97), Belfast,June 1995.

59. Lorenz, E.: Deterministic nonperiodic flow, J. Atom. Sci. 20 (1963), 130–141.60. Lu, C. N., Wu, N. T., and Vemuri, S.: Neural network-based short term load forecasting, IEEE

Trans. Power Systems 8(1) (1993), 336–342.61. Mastorokostas, P. A., Theocharis, J. B., Kiartzis, S. J., and Bakirtzis, A. G.: A hybrid fuzzy

modeling method for short-term load forecasting, Math. Comput. Simulation 51(3/4) (2000),221–232.

62. Matsushita, S., Furuhashi, T., Tsutsui, H., and Uchikawa, Y.: Efficient search for fuzzy modelsusing genetic algorithm, J. Inform. Sci. 110 (1998), 41–50.

63. May, R.: Stability and Complexity in Model Ecosystems, Princeton Univ. Press, Princeton, 1973.64. Medsker, L. R.: Hybrid Neural Networks and Expert Systems, Kluwer, Boston/Dordrecht, 1994.65. Mees, A.: Dynamical systems and tessellations: Detecting determinism in data, Internat. J.

Bifurcation Chaos 1 (1991), 777–794.66. Moghram, I. and Rahman, S.: Analysis and evaluation of five short-term load forecasting

methods, IEEE Trans. Power Systems 4 (1989), 1484–1491.67. Montgomery, D. C. and Johnson, L. A.: Forecasting and Time Series Analysis, McGraw-Hill,

New York, 1976.68. Mori, H. and Kobayashi, H.: Optimal fuzzy inference for short-term load forecasting, IEEE

Trans. Power Systems 11 (1996), 390–396.69. Mori, H., Uematsu, H., Tsuzuki, S., Sakurai, T., Kojima, Y., and Suzuki, K.: Identification of

harmonic loads in power systems using an artificial neural network, in: Proc. of the 2nd Symp.on Expert Systems Applications to Power Systems, July 1989, pp. 371–377.

70. Pang, C. K., Sheble, G. B. and Al Buyeh, F.: Evaluation of dynamic programming basedmethods and multiple area representation for thermal unit commitments, IEEE Trans. PowerApparatus Systems 100(3) (1981), 1212–1218.

71. Pang, T. M., Hubele, N. F., and Karady, G. G.: Advancement in the application of neuralnetworks for short term load forecasting, IEEE Trans. Power Systems 8(1) (1992), 1195–1202.

72. Papalexopoulos, A. D., Hao, S., and Peng, T. M.: An implementation of a neural network basedload forecasting model for the EMS, in: Proc. of 1994 IEEE PES Winter Meeting, Paper No. 94WM 209-7 PWRS, February 1994.

73. Papalexopoulos, A. D. and Hesterberg, T. C.: A regression-based approach to short-term systemload forecasting, IEEE Trans. Power Systems 5(4) (1990), 1535–1544.

74. Park, D. C., El-Sharkawai, M. A., Marks II, R. J., Atlas, L. E., and Damborg, M. J.: Electricload forecasting using an artificial neural network, IEEE Trans. Power Systems 6(2) (1991),442–449.

75. Park, D., Kandel, A., and Langholz, G.: Genetic-based new fuzzy reasoning models withapplication to fuzzy control, IEEE Trans. Systems Man Cybernet. 24(1) (1994), 39–47.

76. Perneel, C., Themlin, J., Renders, J., and Acheroy, M.: Optimization of fuzzy expert systemsusing genetic algorithms and neural networks, IEEE Trans. Fuzzy Systems 3(3) (1993), 1330–1339.

77. Rahman, S. and Bhatnagar, B.: An expert system based algorithm for short-term loadforecasting, IEEE Trans. Power Systems 3(2) (1988).


78. Rahman, S. and Hazin, O.: Generalized knowledge-based short-term load-forecasting tech-nique, IEEE Trans. Power Systems 8(2) (1993), 508–514.

79. Rahman, S. and Shrestha, G.: A priority vector based technique for load forecasting, IEEETrans. PWRS 6 (1991), 1459–1465.

80. Raptis, S. N. and Tzafestas, S. G.: Genetic evolution of neural networks using subpopulationschemes, in: Prof. of SOFTCOM ’98: IMACS/IFAC Internat. Symp. on Soft Computing inEngineering Applications, Athens, June 1998.

81. Saaty, T. L.: The Analytic Hierarchy Process, McGraw-Hill, New York, 1980.82. Schaffer, J. D., Caruana, R. A., and Eshelman, L. J.: Using genetic search to exploit the

emergent behavior of neural networks, Phys. D. 42(1–3) (1990), 244–248.83. Siarry, P. and Guely, F.: A genetic algorithm for optimizing Takagi–Sugeno fuzzy rule bases,

Fuzzy Sets Systems 99 (1998), 37–47.84. Simon, F.: Genetic-neuro-fuzzy systems: A promising fusion, in: Proc. of the 4th IEEE Conf.

on Fuzzy Systems 1 (1995), 259–266.85. Song, Y.-H., Johns, A., and Aggarwal, R.: Computational Intelligence Applications to Power

Systems, Science Press and Kluwer, Dordrecht/Boston, 1996.86. Srinivasan, D., Chang, C. S., and Liew, A. C.: Demand forecasting using fuzzy neural compu-

tation, with emphasis on weekend and public holiday forecasting, IEEE Trans. Power Systems10(4) (1995), 1897–1903.

87. Stamou, G. B. and Tzafestas, S. G.: Fuzzy relation equations and fuzzy inference systems: Aninside approach, IEEE Trans. Systems Man and Cybernet. B 29(6) (1999), 694–702.

88. Stamou, G. B. and Tzafestas, S. G.: Neural fuzzy relational systems with a new learningalgorithm, Math. Comput. Simulation 51(3/4) (2000), 301–314.

89. Sugeno, M. and Kang, G. T.: Structure identification of a fuzzy model, Fuzzy Sets Systems 28(1988), 15–33.

90. Sugeno, M. and Tanaka, K.: Successive identification of a fuzzy model and its applications toprediction of a complex system, Fuzzy Sets Systems 42 (1991), 315–334.

91. Sugeno, M. and Yasukawa, T.; A fuzzy-logic-based approach to qualitative modeling, IEEETrans. Fuzzy Systems 1(1) (1993), 7–31.

92. Takens, F.: Detecting strange attractors in turbulence, in: D. Raag and L. Young (eds), Proc. ofDynam. Systems and Turbulence Conf., Warwick, Springer, Berlin, 1980, pp. 366–381.

93. Terano, T., Asai, K., and Sugeno, M.: Fuzzy Systems Theory and its Applications, AcademicPress, Boston, MA, 1992.

94. Tsoukalas, L.: Neurofuzzy approaches to anticipation: A new paradigm for intelligent systems,IEEE Trans. Systems Man Cybernet. B 28(4) (1998), 573–582.

95. Tzafestas, E. S., Nikolaidou, A., and Tzafestas, S. G.: Performance evaluation and dynamicnode generation criteria for “Principal Component Analysis” neural networks, Math. Comput.Simulation 51(34) (2000), 145–156.

96. Tzafestas, E. S. and Tzafestas, S. G.: Intelligent forecasting and fault diagnosis using neuralestimators, in: S. G. Tzafestas (ed.), Computational Intelligence in Systems and Control Designand Applications, Kluwer, Dordrecht/Boston, 1999, pp. 3–16.

97. Tzafestas, S. G. (ed.): Knowledge-Based System Diagnosis, Supervision and Control, PlenumPress, New York/London, 1989.

98. Tzafestas, S. G. (ed.): Knowledge-Based Systems: Advanced Concepts, Techniques and Appli-cations, World Scientific, Singapore/London, 1997.

99. Tzafestas, S. G.: Soft Computing in Systems and Control Technology, World Scientific,Singapore/London, 1997.

100. Tzafestas, S. G., Dalianis, P. J., and Anthopoulos, J.: On the overtraining phenomenon of backpropagation networks, Math. Comput. Simulation 40(5/6) (1996), 507–521.

101. Tzafestas, S. G. and Dalianis, P. J.: A real-time expert data filtering system for industrial plantenvironments, Math. Comput. Simulation 41(5/6) (1996), 473–484.


102. Tzafestas, S. G., Magoulas, S., and Triantafyllakis, A.: An interactive advisory forecastingsystem, Found. Computing Decision Sci., 17(4) (1992), 235–255.

103. Tzafestas, S. G. and Mekras, N.: Industrial forecasting using knowledge-based techniquesand artificial neural networks, in: S. G. Tzafestas (ed.), Advances in Manufacturing: Decision,Control and Information Technology, Springer, Berlin/London, 1999, pp. 171–180.

104. Tzafestas, S. G. and Raptis, S. N.: Genetic design of fuzzy systems based on a novel rep-resentation scheme, in: Proc. of CESA ’98: IMACS/IEEE Multi-Conference on ComputerEngineering Systems and Applications, Hammamet, Tunisia, April 1998.

105. Tzafestas, S. G. and Raptis, S.: A combination of classical techniques on a SOM-type neuralnetwork platform, in: Proc. of EUSIPCO ’98: IX European Signal Processing Conf., Island ofRhodes, Greece, September 1998.

106. Tzafestas, S. G., Raptis, S., and Stamou, G.: A flexible neurofuzzy cell structure for generalfuzzy inference, Math. Comput. Simulation 41(3/4) (1996), 219–233.

107. Tzafestas, S. G., Saltouros, M.-P., and Markaki, M.: A tutorial overview of genetic algorithmsand their applications, in: Soft Computing in Systems and Control Technology, World Scientific,Singapore/London, 1999, pp. 223–300.

108. Tzafestas, S. G., Tzafestas, E. S., and Maragos, P.: Intelligent forecasting: Fuzzy/neurofuzzymethodologies with case studies, in: Proc. of the 3rd European Intelligent Forecasting SystemsWorkshop, Santorini, Greece, June 2000, pp. 104–113.

109. Tzafestas, S. G. and Venetsanopoulos, A.: Fuzzy Reasoning in Information, Decision andControl Systems, Kluwer, Dordrecht/Boston, 1994.

110. Tzafestas, S. G. and Zikidis, K. C.: An on-line learning, neuro-fuzzy architecture, basedon functional reasoning and fuzzy ARTMAP, in: Proc. ICSC Symp. on Fuzzy Logic andApplications (ISFL’97), Zurich, Switzerland, 1997.

111. Tzafestas, S. G. and Zikidis, K. C.: An on-line self-constructing fuzzy modeling architecturebased on neural and fuzzy concepts and techniques, in: Soft Computing in Systems and ControlTechnology, World Scientific, Singapore, 1999, pp. 119–168.

112. Vemuri, S., Huang, L., and Nelson, D. J.: On line algorithms for forecasting hourly loads ofan electric utility, IEEE Trans. Power Appl. Systems 100 (1981), 3775–3784.

113. Wang, L. and Yen, J.: Extracting fuzzy rules for system modeling using a hybrid of geneticalgorithms and Kalman filter, Fuzzy Sets Systems 101 (1999), 353–362.

114. Watkins, P. R. and Eliot, L. B.: Expert Systems in Business and Finance: Issues andApplications, Willey, Chichester/New York, 1993.

115. Weigend, A. S., Huberman, B. A., and Rumelhart, D. E.: Predicting the future: A connectionistapproach, Internat. J. Neural Systems 1(3) (1990).

116. Wong, K. P.: Expert systems in engineering applications, in: S. G. Tsafestas (ed.), Applicationof AI and Expert Systems in Power Engineering, Springer, Berlin, 1993, Chapter 7.

117. Wu, C.-J. and Liu, C.-Y.: A genetic approach for simultaneous design of membershipfunctions and fuzzy control rules, J. Intell. Robotic Systems 28(3) (2000), 195–211.

118. Yager, R. and Zadeh, L. A.: An Introduction to Fuzzy Logic Applications in Intelligent Systems,Kluwer, Boston/Dordrecht, 1992.

119. Carpinteiro, O. A. S. and Alves Da Silva, A. P.: A hierarchical self-organizing map model inshort-term load forecasting, J. Intell. Robotic Systems 31(1–3) (2001), (this issue) 105–113.

computational intelligence techniques for short-term electric load forecasting

Documents