recurrent neural networks for short-term load forecasting

7
126 IEEE Transactions on Power Systems, Vol. 13, No. 1, February 1998 Recurrent Neural Networks for Short-Term Load Forecasting .J. l~mnaak, Student Member, IEEE D(?piLrt,rrierlt of Elcctrical and Electronic Engineering University of Pretoria South Africa AbstmLt- Forecasting the short-term load entails the construction of a model, and, using the information avail- able, estimating the parameters of the model to optimize the prediction performance. It follows that the more closely the chosen model approximates the actual physical gener- ating process, the higher the expected performance of the forecasting system. In this paper i€'is postulated that the load can be modeled as the output of some dynamic sys- tem, influenced by a number of weather, time and other environmental variables. Recurrent neural networks, be- ing members of a class of connectionist models exhibiting inherent dynamic behavior, can thus be used to construct empirical models for this dynamic system. Because of the nonlinear dynamic nature of these models, the behavior of the load prediction system can be captured in a compact and robust representation. This is illustrated by the per- formance of recurrent models on the short-term forecasting of the nation-wide load for the South African utility, ES- KOM. A comparison with feedforward neural networks is also given. I. INTRODUCTION Forecasting the short-term load entails the construction of an appropriate model, and. using all the a priori in- formation available (e.g. past values of the load, envi- ronmental parameters, etc.), estimating the parameters of the model to optimize the prediction performance In this sense, artificial neural networks have proven to be es- pecially suited for load forecasting applications, as they do not rely on any explicit rules or predefined mathemat- ical relationships between the model inputs and outputs. Rather, they attempt to form natural links between the in- puts and outputs through a process of self-learning, based on a collection of input and desired output patterns. The most common, by far, of these neural network structures, is the three-layer, feedforward model, trained by the error backpropagation algorithm [l]. Its popularity stems from the fact that it can approximate any PE-049-PWRS-1-04-1997 A paper recommended and approved b? the IEEE Power System Engineering Committee of the IEEE Power Engineering Society for publication in the IEEE Transactions on Power Systems. Manuscript submitted December 22,1995; made available for printing April 11,1997. E.C. Both&, Member, IEEE Department of Electrical and Electronic Engineering Uiiiversity of Pretoria South Africa continuous function with ai bitrary accurxy [2]. It has, subsequently, been employed in 3, great number of dif- ferent load forecasting applications, and consistently at- tained a higher success rate than most of the conventional approaches [3]-[6]. One very important characteristics of this structure, is that it IS of a zero-memoly nature. It. thus, forms a purely static mapping between the network inputs and outputs, and lids no knowledge about its pre- vious states, or the influence of previous inputs. Even from an intuitive perspective, however, it is evi- dent that the nature of the load is dynamic, rather than static. The change in the load is not only influenced by the external weather and time variables, but is also highly dependent on the past and current load state Thus, in this heme. static neural networks are suboptimal load fore- casting models, and previous load state information has to be incorporated by presenting the network with t,he appropriate past load values. In nature, complex dynamic patterns, like the system load profile, are often the result of a relatively simple un- derlying generating mechanism. These observations are usually best described in terms of some or other (nonlin- ear) differential equation, rather than a functional rela- tionship between the input and output data. However, in most circumstances, the particular differential equation is either completely unknown, or very difficult to estimate. In this sense, recurrent neural networks, being members of a class of neural network models exhibiting inherent dy- namic behavior, can be used to construct empirical models for the load as a dynamic system. Because of the nonlinear nature of these models, the behavior of the load predic- tion system can be captured in a compact, robust and more natural representation. This paper investigates the success of applying class of recurrent network models, the fully-connected recurrent neural networks, to the short-term load forecasting prob- lem. Section I1 presents the necessary theoretical concepts pertaining to the recurrent network structure and subse- quent tIaining In Section 111, important issues relevant to the application of these models to the short-term load forecasting problem, are discussed. -4 comparative exper- iment (with normal feedforward networks), together with numeric arid graphical I esults, is presented and discussed in Section IV, and finally, a conclusion is reached in Sec- tion V. 0885-8950/98/$10.00 0 1997 IEEE

Upload: ec

Post on 22-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recurrent neural networks for short-term load forecasting

126 IEEE Transactions on Power Systems, Vol. 13, No. 1, February 1998

Recurrent Neural Networks for Short-Term Load Forecasting

.J. l~mnaak, Student Member, IEEE D(?piLrt,rrierlt of Elcctrical and Electronic

Engineering University of Pretoria

South Africa

AbstmLt- Forecasting the short-term load entails the construction of a model, and, using the information avail- able, estimating the parameters of the model to optimize the prediction performance. It follows that the more closely the chosen model approximates the actual physical gener- ating process, the higher the expected performance of the forecasting system. In this paper i€'is postulated that the load can be modeled as the output of some dynamic sys- tem, influenced by a number of weather, time and other environmental variables. Recurrent neural networks, be- ing members of a class of connectionist models exhibiting inherent dynamic behavior, can thus be used to construct empirical models for this dynamic system. Because of the nonlinear dynamic nature of these models, the behavior of the load prediction system can be captured in a compact and robust representation. This is illustrated by the per- formance of recurrent models on the short-term forecasting of the nation-wide load for the South African utility, ES- KOM. A comparison with feedforward neural networks is also given.

I. INTRODUCTION

Forecasting the short-term load entails the construction of an appropriate model, and. using all the a priori in- formation available (e.g. past values of the load, envi- ronmental parameters, etc.), estimating the parameters of the model to optimize the prediction performance In this sense, artificial neural networks have proven to be es- pecially suited for load forecasting applications, as they do not rely on any explicit rules or predefined mathemat- ical relationships between the model inputs and outputs. Rather, they attempt to form natural links between the in- puts and outputs through a process of self-learning, based on a collection of input and desired output patterns.

The most common, by far, of these neural network structures, is the three-layer, feedforward model, trained by the error backpropagation algorithm [l]. Its popularity stems from the fact that it can approximate any

PE-049-PWRS-1-04-1997 A paper recommended and approved b? the IEEE Power System Engineering Committee of the IEEE Power Engineering Society for publication in the IEEE Transactions on Power Systems. Manuscript submitted December 22,1995; made available for printing April 11,1997.

E.C. Both&, Member, IEEE Department of Electrical and Electronic

Engineering Uiiiversity of Pretoria

South Africa

continuous function with ai bitrary accurxy [2]. It has, subsequently, been employed in 3, great number of dif- ferent load forecasting applications, and consistently at- tained a higher success rate than most of the conventional approaches [3]-[6]. One very important characteristics of this structure, is that i t IS of a zero-memoly nature. It. thus, forms a purely static mapping between the network inputs and outputs, and lids no knowledge about its pre- vious states, or the influence of previous inputs.

Even from an intuitive perspective, however, it is evi- dent that the nature of the load is dynamic, rather than static. The change in the load is not only influenced by the external weather and time variables, but is also highly dependent on the past and current load state Thus, in this heme. static neural networks are suboptimal load fore- casting models, and previous load state information has to be incorporated by presenting the network with t,he appropriate past load values.

In nature, complex dynamic patterns, like the system load profile, are often the result of a relatively simple un- derlying generating mechanism. These observations are usually best described in terms of some or other (nonlin- ear) differential equation, rather than a functional rela- tionship between the input and output data. However, in most circumstances, the particular differential equation is either completely unknown, or very difficult to estimate. In this sense, recurrent neural networks, being members of a class of neural network models exhibiting inherent dy- namic behavior, can be used to construct empirical models for the load as a dynamic system. Because of the nonlinear nature of these models, the behavior of the load predic- tion system can be captured in a compact, robust and more natural representation.

This paper investigates the success of applying class of recurrent network models, the fully-connected recurrent neural networks, to the short-term load forecasting prob- lem. Section I1 presents the necessary theoretical concepts pertaining to the recurrent network structure and subse- quent tIaining In Section 111, important issues relevant to the application of these models to the short-term load forecasting problem, are discussed. -4 comparative exper- iment (with normal feedforward networks), together with numeric arid graphical I esults, is presented and discussed in Section IV, and finally, a conclusion is reached in Sec- tion V.

0885-8950/98/$10.00 0 1997 IEEE

Page 2: Recurrent neural networks for short-term load forecasting

127

Fig. 1. Fully-connected recurrent neural ne-work (RNN) with m = 3 neurons.

Y

).,

Fig. 2. Individual neuron structure for the discrete-time, fully- connected recurrent neural network.

11. RECURRENT NEURAL NETWORKS Recurrent neural networks are members of a class of

neural network models exhibiting inherent dynamic be- havior. The most general of these is the fully-connected recurrent neural network, an example of which is depicted in Figure 1. Each neuron (N,) is connected to every other neuron ( N J ) , including itself, via the appropriate weights ( tu t3) . Consider neuron i, shown in more detail in Figure 2 .

The recursive equation describing the dynamic behavior of the neuron state can be derived from the construction of the neuron in Figure 2 to be

m

s z ( k + l ) = ~ w , ~ Q ~ ( s ~ ( ~ ) ) + J , ( K + ~ ) , i = l , . . . ,m, 1=1 (1)

where the following notation is used: m is the number of neurons in the network, k is the discretized time index, s z ( k ) is the state of neuron z at time step k , J , ( k ) is the input to neuron a at time step k , tuz3 is the connection weight from neuron j to neuron i, and gz(.) is the neuron activation function of the i-th neuron. The most common of these, are the linear and sigmoidal transfer functions.

In vector notation form, Equation (1) becomes

s (k + 1) = Wg ( ~ ( k ) ) + J(k + l), ko 1. k 1. k, , (2)

where [ko, k f ] is the simulation range of interest.

! - - I Fig 3. Centbin1 description of the iecuiient neural network system

To generalize the recurrent neural network description. input and output equations can be defined as

J(k) = 3-1 [ ~ ( k ) ] ( 3 )

and

respectively, where u(k) E R P is the external input to the system, and y ( k ) E R" is the output of the system, p , n 5 7n. This generalization results in the recurrent neura4 network system depictecl in Figure 3, the dynamic behavior of which is uniquely determined by (2),(3),(4) and y ( k o ) , the initial output of the system (this is true because the initial state of the recurrent neural network, s ( k o ) , can be obtained from y ( k o ) and the initial input, u ( k o ) , by the inverse of (4)).

This system can be slhown (see [7]) to be equivalent to the structure depicted in Figure 4. It exhibits the same dynamic behavior as the fully connected recurrent neural network defined by ( 2 ) , with input and output equations defined by

J ( k ) = FI [ ~ ( k ) ] = 0 ~ ( k ) + b

and

respectively, and

In the above description J z and W1 are the weight ma- trices from the system input and output neurons, respec- tively, in the input layer, to the units of the hidden layer, W;' is the pseudo-inverse of W1, b is the bias input vector to the hidden layer neurons, and W2 is the weight matrix from the hidden layer neurons to the system output neurons in the output layer. Sigmoidal transfer functions are employed for the units in the hidden layer ( g ( . ) ) .

Thus, from Figure 4, it is evident that the recurrent network parameters can be optimized by training a f e e d - f o r w a r d network to learn the mapping [ y ( k ) , u ( k ) ] ---+

Page 3: Recurrent neural networks for short-term load forecasting

128 1 1,

I

. . . . .

L 71

P

Fig 4 Three-layer feedforward neural network, wlth time-delays on the output layer and output feedback.

y(k + I) , which can be written in terms of the network parameters as

Y ( k + 1) = w2g ( W l Y ( k ) + W k ) + b) ’ (8)

111. APPLICATION T O SHORT-TERM LOAD FORECASTING

This section will explore some of the practical issues in- volved when constructing recurrent neural network models for short-term load forecasting applications. The desired model output is assumed to be the hourly system inte- grated load values for a number of hours into the future.

A . Selectzng Network Input Varzables Probably one of the most difficult tasks in the design

of the network structure, is the selection of appropriate network inputs. Because the dynamic behavior or the net- work is highly dependent on the chosen inputs, the load has to exhibit a strong degree of statistical correlation with these variables. It is also very important that the set of network inputs adequately represents all the exter- nal factors influencing the system load. Thus, the process of selecting the relevant network inputs has to be guided by an intuitive knowledge of the various influencing fac- tors, together with a careful numerical validation of these assumptions.

Prevalent weather patterns have a significant impact on the nature of the load profile. Thus, the inclu- sion of weather variables in the network inputs can sig-

nificantly improve the prediction performance. Typi- cal weather variables include tempeIature information (hourly, minimum, maximum, average, etc.), humidity, rainfall, wiritl vcJlocity, sky condition indicators, and many more. The most important of these are the tempera- ture variables, representing the strongest correlation with weather-related load variations. Temperatures can, in general, also be measured to a higher degree of accuracy relative to any of the other weather variables. Many of the forecasting models proposed in the literature employ hourly-varying temperature variables [8][9]. This holds practical limitations in the sense that any increase in per- formance gained by using an hourly-varying temperature variable, is offset by the lack of accuracy in forecasting these values. A much more realistic model is one that relies on temperature variables that are updated at a fre- quency of, at most, once a day.

There is also a very strong dependence of the load on time. The daily load profile for d specific day retains es- sentially the same shape, having more or less the same value for a given hour. These fluctuations are mostly con- tributed to by localized (in time) weather effects The properties of these profiles also exhibit variations as a re- sult of seasonal changes in the weather patterns. Thus, it is evident that the inclusion of time variables is essential if the prediction accuracy is to be maximized. These include hour of the day, day of the week, season of the year, etc.

The load shape is also influenced by a vast number of other external influences. The magnitude of the induced load variation is dependent on the system impact of the specific influence. For example, the “brown-out” of a sin- gle 11 kVA domestic transformer may have a negligible effect on the nation-wide load, whereas the influence of a national holiday will be clearly visible. Sometimes these effects can be quantified and presented to the network as an additional input. In these cases the network forms an adequate internal representation, linking the forecasted load t o the specific external influence. More often than not, such a simple technique does not suffice, due to the lack of sufficient data pertaining to a specific external in- fluence, and calls for alternative modeling techniques for these situations [ l O ] [ l l ] .

B. Scalzng the Network Varzables

Due to the nature of the sigmoidal transfer function, the outputs of the neurons in the hidden layer, are limited to values between zero and one. Thus, allowing large val- ues fisr the neuron input variables will cause the threshold functions to be driven into saturation frequently, resulting in an inability to train In practical implementations this problem is solved by scaling the network inputs and out- puts to an appropriate range (usually between zero and one).

Care has to be taken not to destroy vital relationships between different network variables by the scaling tech-

Page 4: Recurrent neural networks for short-term load forecasting

129

is rthtively st,ihonary within the selected time frame, and weather p;it,t,oriis (or other cwxrllal influences) are atle- quately rttprcw?rit,etl by the training (lata. This is not al- ways the c i w , and one cui intuitively conclude that the prediction performance will tleteriorat,e as the lead time increases : ~ r i t l rapidly changing weather patterns (e.g. cold or warm frorits, seasonal transition effects, etc.) are en- countered.

niques employed. Thus, iis a, rule of thumb, if two xiet,work variables represent the same physical parameter (e.g. tern- peraturc), albeit for different time instances or geographi- cal locations, they should be scaled according to the same strategy, using the same scaling parameters.

C. Encoding the Network Wrriables

Variable encoding pertains to the way the inputs and outputs are presented to the network. The most impor- tant discrimination here, is that between continuous and binary encodings, each having significance for different types of network variables.

Continuous encoded variables include, amongst others, temperature and load values, where the numeric value of the encoded variable has some relationship (usually linear) with the physical variable or variables being represented.

Opposed to this, binary encoded variables hold no quan- titative information. Variables encoded in this way are those that provide non-numeric system state information (e.g. what day of the week it is), or indicate whether a certain event has occurred. Thus, the relative importance of the network variable is not determined by its numeric value (e.g. for day-type variables, day 5 is not 5 times as important as day 1). In these situations a binary represen- tation results in a more natural relationship to be learned, a t the cost of a possible increase in the dimensionality of the network input space.

D. Selecting the Training Data Because the recurrent neural network forms its internal

representation relative to the training data, it is important that the selected data closely match the circumstances of the time period to be predicted. From a theoretical point of view, the best network generalization can be obtained by using all of the training samples pertaining to a cer- tain set of conditions. These large data sets, however, often lead to practical difficulties during training. One strategy to work around this problem is to divide the fore- casting into a number of subproblems, each to be solved by a different network (e.g. one for each day-type). Each of the resulting networks is trained with as large a data set as is practically possible. This allows the network to capture information about the system states and impor- tant load trends that is simply not possible with smaller training sets (due to under-representation), and each of the individual networks can be optimized for the specific subproblem.

The moving window data selection strategy was pro- posed to reduce the sizes of the training sets required. In this approach the most recent data (e.g. that for the previous two weeks) isvsed to train the network for the re- quired lead time prediction (mostly 24 hours to one week ahead). For each new forecast the training data is dif- ferent, and the network is subsequently retrained. This method rests, of course, on the assumptions that the load

E. Evuluuting the Prediction Performance

The final step in the design procedure is the assess- ment of the forecasting performance of the trained net- works. This evaluation is typically done by quantifying the prediction error obtained on an independent data set. If training was successful, the networks will be able to generalize, resulting in a high accuracy in the forecasting of unknown ptLtterns (provided the training data is suffi- ciently representative of the forecasting situation). Vari- ous error metrics (distancle measures) between the actual and forecasted load are defined, but the one most com- monly adopted by load forecasters, is the absolute per- centage error ( A P E ) , defined by

(9)

where l k and ik is the actual and forecasted load at time instance, k , respectively. This error measure is more meaningfully presented ai an average and standard devi- ation over the forecasting range of interest. More insight into the forecasting performance can also be obtained by examining the distribution functions (probability and cu- mulative) of the absolute percentage error. An additional measure of the error is defined from the cumulative dis- tribution function as the 90th percentile of the absolute percentage error. This, in essence, provides an indication of the behavior of the tail of the distribution of errors and indicates that only 10% of the errors exceed this value.

IV. EXPERIMENTAL PROCEDURE AND RESULTS In this section the description and results of a compara-

tive experiment between recurrent and purely feedforward neural networks for the short-term load forecasting prob- lem, are presented. The data used, was extracted from the hourly integrated values of the total nation-wide load profile for the South African utility, ESKOM, for the year 1994. In addition, the corresponding daily maximum and minimum temperatures for four of the major load centers (Johannesburg, Cape Town, Durban and Bloemfontein) were used as representative indicators of the South African weather conditions during the same period.

The forecasting was performed with a lead time of one week (168 hours) for 21 weeks, starting on July 1, 1994. Training data was selected according to the moving win- dow strategy from the load and temperature data of the

Page 5: Recurrent neural networks for short-term load forecasting

130

week 6 week 7 week 8 week 9

three weeks prior to thc weck of the forecast Thus, both a recurr erit arid purely fcwlfor ward network were traineti for each of the four weeks Only normal clays were consid- crecl, 5pecictl lioliclays "I other unusual events, such '1s de( tions, etc , require apecia1 modeling techniques Thrs is rndinly due to the lack of data, needed to construct ct sufficiently general model for these conditions.

All feedforward neural networks (including those used for the recurrent network training) employed a single hid- den layer, and were trained in batch mode according to the error backpropagation algorithm, using conjugate gra- dient descent optimization [12]. In all cases, the number of neurons 111 the hidden layer was varied until the predic- tion performances were satisfactory on both the training and evaluation data sets. These performances are quan- tified in terms of the mean, standard deviation, and 90th percentile of the absolute percentage error

Each of the iridividual networks had only one output, the forecasted load for a specific hour The nature and number of network input variables differed for the recur- rent and feedforward neural network models The follow- ing set of input variables resulted in the best performance for the feedforward networks.

The past load values, Z(n - I), Z(n - Z), Z(n - 3), and Z(n - 24), where Z(71 - 2 ) represents the load value z hours before the forecasting time instant, n These variables were presented to the network as continuous variables, linearly scaled to fd l within the range [0.5,5]. The same scale parameters were used for all the load inputs, as well as the forecasted load output.

An hour index, represented as a continuous value within the range [0,2 31, for hours ranging from 0 to 23.

A weekday indicator, represented as a 3-bit binary code (000 to 110) for the weekdays from Sunday to Saturday.

The average minimum and maximum temperatures for the forecasting day, and one day earlier, Tg,(d - 3) and TgaZ ( d - I), J = 0,1, where these variables are defined as

1 5 x 9 ~ 1 2.3512 1.9260 4.5378 1 5 x 8 ~ 1 2.3288 2.0094 4.8716 15 x 12 x 1 3.0858 2.3187 6.2668 1 5 x 7 ~ 1 3.4593 3.2967 8.0817

TABLE I FOREC'Ab r l N G RESULTS FOR TlIE FEEDFORWARD NETWORKS

week 21 Average

1 4

T,""(d - j ) = - [ T , J H B ( d - j) + T,CT(d - j)

+T,DBN(d - j) + T , B F N ( d - j ) ] , (10)

15 x 8 x 1 1.6606 1.4572 3.4705 2.5673 2,0689 5.2619

with u E {mzn,max}, d the day of the forecast, and J the delay in number of days. These variables were also pre- sented to the network as linearly scaled values in the range. [0 ,3] . The delayed temperature values were included to model the load inertia with temperature variations.

Two temperature change variables (to model the sensi- tivity of the load to temperature changes), defined as

nT;"(d) = T,""d) - T,""(d - 1). (11) These were linearly scaled to fall within the range, [-0.5,0.5], and presented to the network as continuous variables.

1 1 ne t Size I !r lApE I f7,4pE I 90th-%]

week 17 /I 15 x 7 x 1 I 4.4217 I 2.98061 1 8.1794 week 18 I / 15 x 12 x 1 1 3.3759 I 2.6754 1 6.868.1

I / I ~ . - - - - -

week 19 1 1 15 x 10 x I I 2.9956 I 2.4668 1 5.9222 week 20 I / 15 x 7 x 1 1 3.2141 I 2.6246 I 6.9955

Thus, including the network bias node, each of the feed- forward networks had 15 input neurons. For the recurrent networks a much reduced set of input variables is possible, as these models form an internal representation of the load state and effects of the external influences. Thus, only the two time variables (hour and weekday indicator), along with the average minimum and maximum temperatures, as defined above, for the forecasting day, were employed. In addition, it was found that presenting the weekday in- put as a continuous value (between 0 and 6 for Sunday to Saturday) actually improved the results, while further reducing the number of inputs. A second order approx- imation for the dynamic load model provided adequate accuracy, I esulting in two delayed load values employed during the feedforward training of the recurrent networks.

The forecasting results for the feedforward and recur- rent network models are presented in Tables I and 11, re- spectively. The results show that recurrent networks, on average 68% the size of the comparative feedforward net- works, with half the number of inputs, produce errors that are comparable to or better than that of the feedforward networks. This fact is further illustrated by the compar- ative cumulative distribution functions of the prediction error, depicted in Figure 5, revealing a much steeper slope for the recurrent networks. An example of the feedforward and recurrent predicted profiles, together with the actual load, is given in Figure 6.

Page 6: Recurrent neural networks for short-term load forecasting

131

week 9 week 10 week 11 week 12 week 13

'Ial3LE I1 FORECASTING RESULTS FOR THE RECURRENT NETWORKS

7 x 14 x 1 3.4453 3.3615 7.4667 7 x 16 x 1 2.3694 2.1702 5.3451 7 x 15 x 1 1.7493 1.4204 3.7655 7 x 10 x 1 1.9509 1.6162 3.8416 7 x 14 x 1 2.4220 2.3794 5.1853

week 20 week 21

Average

7 x 13 x 1 1.5909 1.3966 3.5860 7 x 1 4 x 1 0.9316 0.7785 1.9595

2.0237 1.8564 4.2639

"Denotes the size of the equivalent feedforward network used dur- ing training.

Cumulative Distrihution Functions (F)

Dynimic networks

LL :::r , , I , , j 0.3

11.2

0.I ll 5 IO I S 20 2s 311

Ahsalule Pemencn&y Ernir

Fig. 5 . Cumulative distribution of the absolute percentage error ( A P E ) for the feedforward and recurrent networks over all the samples in the evaluation data sets.

I?lHXI L- O 211 411 hll XI1 IIXI I20 141) IMI

HOW

Fig 6. Actual, and feedforward and recurrent forecasted load profiles For the week starting July 8, 1994.

V. C O N C L U S I O ~ ~

In this paper we have shown how a recurrent neural network architecture can be applied to the short-term load forecasting problem. The recurrent network successfully captured the dynamic behavior of the load, resulting in a more compact and natural internal representation of the temporal information contained in the load profile, than what is possible with a normal feedforward nettwork.

Furthermore, the choseh recurrent network architecture facilitates a training method of no more than feedforward complexity, making the optimization of these networks even faster than that of tlhe comparative feedforward net- works, because they are generally smaller. This also holds true for the implementation phase, as the equivalent struc- ture of Figure 4 show that the state recursive equation, together with the input and output mappings, can be im- plemented by a feedforward structure.

The deliberate avoidance of special days, such as hol- idays, is due to the fact that these occurrences can be categorized as outliers. Recause of their (desirable) gen- eralization property, neural networks, whether static or recurrent, tend to filter out these outliers. If enough data about the specific occurrence is available, a separate net- work can be constructed. More often than not, this is not the case, and the forecast mg model need to be augmented by structures with a higher level of intelligence, such as fuzzy logic networks or expert systems. This was not done here, as the emphasis was on the comparison of recurrent and feedforward networks.

Even though the results are promising, more research still needs to be done before any conclusive statements concerning the appropriateness of each of the models, can be made. This includes tlhe investigation of various other recurrent network topologies and training paradigms, to- gether with an analysis of conditions for stability. f i r - thermore, good data selection strategies and network in- put and output representations have to be determined. In conclusion, it is our hope that the field of recurrent neural networks has opened a new and exciting dimension in the

Page 7: Recurrent neural networks for short-term load forecasting

132

discipline of load forec.;wt,irig.

REFERENCES

[I]

[2]

D. Rurrielhart and .I. McClellaiid, Parallel Distrzhted Process- ing, vol. I . Cambridge, MA: MIT Press, 1987. I<. Horiiik, M. Stinchcombe, aiid H. White, “Mulitlayer feedfor- ward networks are universal approximators,” Neural Networks,

D. C. Park. M. A. El-Sharkawi, R. J . M. 11, L. E. Atlas. and VOI. 2, pp. 359-366, 1989.

131

Elizabeth C . Botha (Member, IEEE) > v u born in Pretoria, South Africa. She received her B.Eiig. (Electronics) aiid M.Eng. (Elec- tronics) Degrees from the University of Pre- toria in 1983 and 1985, respectively. In 1989 she graduatecl with a PhD in Electrical atid Computer Engineering from Carnegie Mellon Uiiiversity, Pittsburgh, Pennsylvania. She is currently professor and head of the Comput- ers and Pattern Recognition Group in the De- partment of Electrical and Electronic Engi-

neering at the University of Pretoria. Her research interests include L ’

M. J Damborg, “Electric lond forecasting using an artificial neural network,” IEEE Transactzons on Power Systems, vol 6 , pp 442-449, May 1991 T M Peng, N F Hubele, and G G. i<arddy, “Advancement in the application of neural networks for short-term load forecast- ing,” IEEE Transactzons on Power Systems, vol 7 , pp 250- 257, February 1992

[5] 0 Mohammed, D Park, R Merchant, T Dinh, C Tong, A Azeem, J Farah, and C Drake, “Practical experiences with an adaptive neural network short-term load forecasting sys- tem,” IEEE Transactzons o n Power Systems, vol 10, pp. 254- 265, Februdry 1995

[6] A D Papalexopoulos, S Hao, and T - M Peng, “An imple- mentation of a neural network based load forecasting model for the EMS,” IEEE nansac t zons on Power Systems, vol 9, pp 1956-1962, November 1994

[7] 0 Olurotimi, “Recurrent neural network training with feed- forward complexity,” IEEE Transactzons on Neural Networks, vol 5, pp 185-197, March 1994 S -T Chen, D C Yu, and A R. Moghaddamjo, “Weather sen- sitive short-term load forecasting using nonfully connected arti- ficial neural network,” IEEE Transactzons on Power Systems,

[9] C N Lu, H T Wu, and S Vemur:, “Neural network based short term load forecasting,” IEEE Transactzons on Power Sys- t ems , vol 8 , pp 336-342, February 1993

[lo] D Srinivasan, A C Liew, and C S Chang, “Forecasting daily load curves using a hybrid fuzzy-neural approach,” IEE Proc -Gener. Transm. Dzstrzb , vol 141, pp. 561-567, Novem- ber 1994

[ I l l K.-H Kim, J -K Park, K.-J Hwang, and S -H. Kim, “Imple- mentation of hybrid short-term load forecasting systems us- ing artificial neural networks and fuzzy expert systems,” IEEE Transactzons on Power Systems, vol 10, pp 1534-1539, Au- gust 1995

[12] M J D. Powell, “Restart procedures for the conjugate gradi- ent method,” Mathematzcal Programmzng, vol. 12, pp 241-254, Apr 1977

pattern recognition, image processing, neural networks and optical iiiformation processing She has published 15 papers in international journals and has acted as consultant to industry in all these fields

[4]

[SI

V O ~ 7, pp 1098-1104, August 1992

Jaco Vermaak (Student Member, IEEE) was born in Pretoria, South Africa, in 1969 He received the B Eng Degree in Electronic Engineering and the B Eng (Honours) Degree in Computer Engineering, from the University of Pretoria, South Africa, in 1993,and 1994, respectively In 1995 he joined the Depart- ment of Electrical and Electronic Engineer- ing, at the University of Pretoria, as a lec- turer in the Computers and Pattern Recogni- tion Grouu He is currently workim towards

the completion of the M Eng Degree in ElectriEal Engineering, a t the same university.