forecasting day-ahead electricity prices in europe: the ...bdeschutter/pub/rep/17_021.pdf ·...

Delft University of Technology

Delft Center for Systems and Control

Technical report 17-021

Forecasting day-ahead electricity prices in

Europe: The importance of considering

market integration∗

J. Lago, F. De Ridder, P. Vrancx, and B. De Schutter

If you want to cite this report, please use the following reference instead:

J. Lago, F. De Ridder, P. Vrancx, and B. De Schutter, “Forecasting day-ahead elec-

tricity prices in Europe: The importance of considering market integration,” Applied

Energy, vol. 211, pp. 890–903, 2018.

Delft Center for Systems and Control

Delft University of Technology

Mekelweg 2, 2628 CD Delft

The Netherlands

phone: +31-15-278.24.73 (secretary)

URL: https://www.dcsc.tudelft.nl

∗This report can also be downloaded via https://pub.deschutter.info/abs/17_021.html

https://www.dcsc.tudelft.nl

https://pub.deschutter.info/abs/17_021.html

Forecasting day-ahead electricity prices in Europe: the importance of

considering market integration

Jesus Lagoa,b,1, Fjo De Ridderb, Peter Vrancxc, Bart De Schuttera

aDelft Center for Systems and Control, Delft University of Technology,

Mekelweg 2, 2628CD Delft, The NetherlandsbEnergy Technology, VITO-Energyville, ThorPark, 3600 Genk, BelgiumcAI Lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium

Abstract

Motivated by the increasing integration among electricity markets, in this paper we propose two differentmethods to incorporate market integration in electricity price forecasting and to improve the predictiveperformance. First, we propose a deep neural network that considers features from connected markets toimprove the predictive accuracy in a local market. To measure the importance of these features, we proposea novel feature selection algorithm that, by using Bayesian optimization and functional analysis of variance,evaluates the effect of the features on the algorithm performance. In addition, using market integration, wepropose a second model that, by simultaneously predicting prices from two markets, improves the forecastingaccuracy even further. As a case study, we consider the electricity market in Belgium and the improvementsin forecasting accuracy when using various French electricity features. We show that the two proposedmodels lead to improvements that are statistically significant. Particularly, due to market integration, thepredictive accuracy is improved from 15.7% to 12.5% sMAPE (symmetric mean absolute percentage error).In addition, we show that the proposed feature selection algorithm is able to perform a correct assessment,i.e. to discard the irrelevant features.

Keywords: Electricity Price Forecasting, Electricity Market Integration, Deep Neural Networks,Functional ANOVA, Bayesian Optimization

1. Introduction

As a result of the liberalization and deregulationof the electricity markets in the last two decades,the dynamics of electricity trade have been com-pletely reshaped. In particular, electricity has be-come a commodity that displays a set of charac-teristics that are uncommon to other markets: aconstant balance between production and consump-tion, load and generation that are influenced by ex-ternal weather conditions, and dependence of theconsumption on the hour of the day, day of theweek, and time of the year [1]. Due to these facts,the dynamics of electricity prices exhibit behaviorunseen in other markets, e.g. sudden and unex-pected price peaks or seasonality of prices at threedifferent levels (daily, weekly, and yearly) [1].

∗Corresponding authorEmail address: [email protected] (Jesus Lago)

As a result of this unique behavior, electricitymarkets have become a central point of researchin the energy sector and accurate electricity priceforecasting has emerged as one of the biggest chal-lenges faced by the different market entities. Theusual motivation behind these efforts is a purelyeconomic one: as forecasting accuracy increases,the negative economic effects of price uncertaintyare mitigated and the market players make an eco-nomic profit. In addition, another important fact toconsider is that electricity markets are establishedto keep the grid stable. In particular, as prices be-come more volatile, the balance of the grid is com-promised, strategic reserves may have to be used,and the risk of a blackout increases. Therefore,by accurately forecasting electricity prices, not onlyeconomic profits can be made, but also the systemstability is improved.

Due to the above motivations, electricity price

Preprint submitted to Elsevier

forecasting has been continuously developed andimproved for the last decades, and as a result, theliterature comprises a large variety of distinctive ap-proaches, e.g. see the literature review [1]. Never-theless, to the best of our knowledge, a topic thathas been not yet addressed is the influence of neigh-boring and connected markets, i.e. market integra-tion, on the forecast accuracy. In particular, as dif-ferent areas in the world, e.g. the European Union[2], are enforcing a larger level of integration acrossnational electricity markets, it is sensible to assumethat neighboring markets might play a role in theforecasting efficiency. To address this scientific gap,this paper proposes a modeling framework that isable to improve predictive accuracy by exploitingthe relations across electricity markets. In particu-lar, by modeling market integration in two differentways, the proposed framework is shown to obtainstatistically significant improvements.

The paper is organized as follows: Section 2starts by presenting the literature review, motiva-tion, and contributions. Next, Section 3 and 4 re-spectively describe the methods and data that areused in the research. Then, Section 5 defines theproposed modeling framework. Next, Section 6 de-rives a novel approach for feature selection and usesit to select the optimal features in the case study.Finally, Section 7 evaluates the proposed model-ing framework by means of predictive accuracy, andSection 8 summarizes and concludes the paper.

2. Literature Survey and Contributions

In this section, we present the literature review ofthree topics that are relevant for the research: elec-tricity price forecasting, market integration, andfeature selection. Based on that, we motivate ourwork and explain our contributions.

2.1. Electricity Price Forecasting

The price forecasting literature is typically di-vided into five areas: (1) multi-agent or gametheory models simulating the operation of marketagents, (2) fundamental methods employing physi-cal and economic factors, (3) reduced-form modelsusing statistical properties of electricity trade forrisk and derivatives evaluation, (4) statistical mod-els comprising time series and econometric models,and (5) artificial intelligence methods [1]. For fore-casting day-ahead prices, or in general any othertype of electricity spot prices, statistical and artifi-cial intelligence methods have showed to yield the

best results [1]. As a result, they are the main focusof this review.

Typical statistical methods are: AR and ARXmodels [3], ARIMA models [4, 5], dynamic re-gression [6], transfer functions [6], double sea-sonal Holtz-Winter model [7], TARX model [8],semi/non-parametric models [3], or GARCH-basedmodels [9]. In addition, within the same class ofmethods, different hybrid models have been also ap-plied, e.g. wavelet-based models [5, 10, 11].

Statistical models are usually linear forecasters,and as such, they are successful in the areas wherethe frequency of the data is low, e.g. for weeklypatterns. However, for hourly values, the nonlinearbehavior of the data might be too complicated topredict [12]. As a result, motivated by the need forforecasters that are able to predict the nonlinearbehavior of hourly prices, several artificial intelli-gence methods have been proposed. Among thesemethods, artificial neural networks [13–16], supportvector regressors [17], radial basis function networks[18], and fuzzy networks [19] are among the mostcommonly used. A recent study [20] showed thatDeep Neural Networks (DNNs) can also be a suc-cessful alternative.

The results comparing the accuracy of the men-tioned models have however produced unclear con-clusions [14]. In general, the effectiveness of eachmodel seems to depend on the market under studyand on the period considered.

2.2. Market Integration

In the last decades, the EU has passed severallaws trying to achieve a single and integrated Euro-pean electricity market [2, 21]. At the moment,while a single market is far from existing, thereis evidence suggesting that the level of integrationacross the different regional markets has been in-creasing over time [22]. In particular, evidence sug-gests that in the case of Belgium and France, thespot prices share strong common dynamics [23].

While some researchers have evaluated the levelof integration of the European markets [22–24], andothers have proposed statistical models to evalu-ate the probability of spike transmissions across EUmarkets [25], the literature regarding market inte-gration to improve forecasting accuracy is ratherscarce. To the best of our knowledge, only twoother works have taken into account some sort ofmarket integration, namely [26] and [27].

In particular, [26] analyzes the effect of using theday-ahead prices of the Energy Exchange Austria

2

(EXAA) on a given day to forecast the prices ofother European markets on the same day. Using thefact that for the EXAA market the clearing pricesare released before the closure of other Europeanmarkets, [26] models the price dynamics of severalEuropean markets and considers the EXAA pricesof the same day as part of these models. It is shownthat, for certain European markets, using the avail-able prices from the EXAA improves the forecastingaccuracy in a statistically significant manner.

Similarly, [27] considers external price forecastsfrom other European markets as exogenous inputsof an artificial neural network to predict Italian day-ahead prices. [27] shows that using the given fore-casts the accuracy of their network can be improvedfrom 19.08% to 18.40% mean absolute percentageerror (MAPE).

2.3. Feature Selection

Feature selection is defined as the process to se-lect, for a given model, the subset of importantand relevant input variables, i.e. features. Typi-cally, three families of methods to perform featureselection exist: filter, wrapper, and embedded meth-ods [28]. Filter methods apply some statistical mea-sure to assess the importance of features [29]. Theirmain disadvantage is that, as the specific modelperformance is not evaluated and the relations be-tween features are not considered, they may selectredundant information or avoid selecting some im-portant features. Their main advantage is that, asa model does not have to be estimated, they arevery fast. By contrast, wrapper methods performa search across several feature sets, evaluating theperformance of a given set by first estimating theprediction model and then using the predictive ac-curacy of the model as the performance measure ofthe set [29]. Their main advantage is that they con-sider a more realistic evaluation of the performanceand interrelations of the features; their drawback isa long computation time. Finally, embedded meth-ods, e.g. regularization [30, Chapter 7], learn thefeature selection at the same time the model is es-timated. Their advantage is that, while being lesscomputationally expensive than wrapper methods,they still consider the underlying model. However,as a drawback, they are specific to a learning algo-rithm, and thus, they cannot always be applied.

Approaches for feature selection in the electricityprice forecasting literature vary according to theprediction model used. For time series methods us-ing only prices, e.g. ARIMA, autocorrelation plots

[10] or the Akaike information criterion [31] havebeen commonly used. In the case of forecasters withexplanatory variables, e.g. neural networks, mostresearchers have used trial and error or filter meth-ods based on linear analysis techniques: statisticalsensitivity analysis [7, 13], correlation analysis [32],or principal component analysis [33]. Since pricesdisplay nonlinear dynamics, the mentioned tech-niques might be limited [34]; to address this, nonlin-ear filter methods such as the relief algorithm [35] ortechniques based on mutual information [34, 36, 37]have been proposed. More recently, a hybrid non-linear filter-wrapper method, which uses mutual in-formation and information content as a first filterstep and a real-coded genetic algorithm as a secondwrapper step, has been proposed [38].

2.4. Motivation and Contributions

While the effects of market integration can dra-matically modify the dynamics of electricity prices,there is a lack of a general modeling framework thatcould model this effect and analyze its impact onthe electricity market. To address this gap, in thispaper we provide general models to identify theserelations and a technique to quantify the impor-tance of market integration. As we will show, un-derstanding these relations is key to improve theaccuracy of forecasting models, and thus, to obtainenergy systems that are economically more efficient.

The two available papers on market integrationin price forecasting, [26, 27], are both limited tothe case where the day-ahead prices of neighboringmarkets are known in advance. While these papersprovide a first modeling approach for market inte-gration, the methodologies are very specific and canonly be applied in limited situations. In particu-lar, most European electricity markets release theirday-ahead prices at the same time, and thus, theprices of neighboring markets cannot be obtainedin advance. The only exception to this rule is theEXAA market, which was the object of study of[26]. In addition to this limitation, neither [26] nor[27] analyzed the relevance of market integration.

In contrast to [26, 27], we propose a general mod-eling framework that is able to model and analyzemarket integration for any given market. In par-ticular, we propose a modeling framework basedon DNNs that considers market integration fea-tures that are available beforehand in all Europeanmarkets. Using past prices and publicly availableload/generation forecasts in neighboring markets,we propose a first forecaster that models market

3

integration effects on price dynamics. Next, wepropose a second forecaster that further generalizesmarket integration: besides modeling market inte-gration using input features, the second forecasteralso includes the effect in the output space. By si-multaneously predicting prices in multiple markets,the proposed forecaster is able to improve the pre-dictive accuracy.

Finally, we also contribute to the field of featureselection algorithms. More specifically, while thefeature selection methods for electricity price fore-casting proposed in the literature provide good andfast algorithms, they suffer from three main draw-backs: (1) They all [7, 10, 13, 32–36, 38] perform afilter step where the model performance is not di-rectly considered; therefore, the resulting selectedfeatures might be redundant or incomplete. (2) Inthe case of the algorithms for nonlinear models [34–36, 38], the inputs have to be transformed to lower-dimensional spaces; as a result, feature informationmight be lost. (3) While they provide a selectionof features, none of these methods computes therelative importance of each feature.

To address these issues, we propose a wrapper se-lection algorithm based on functional ANOVA thatdirectly selects features using nonlinear models andwithout any feature transformation. While the pro-posed approach is computationally more expensivethan previously proposed methods, it can perform amore accurate feature selection as it avoids transfor-mations, selects the features based on the originalmodel, and computes the individual performance ofeach feature.

3. Preliminaries

In this section we introduce the theoretical con-cepts and algorithms that are used and/or modifiedlater on in the paper.

3.1. Day-ahead Forecasting

The day-ahead electricity market is a type ofpower exchange widely used in several regions of theworld. In its most general format, producers andconsumers have to submit bids for the 24 hours ofday d before some deadline on day d−1 (in most Eu-ropean markets, this deadline occurs at 11:00 am or12:00 am). Except for some markets, these bids aretypically defined per hour, i.e. every market playerhas to submit 24 bids.

After the deadline, the market operator takesinto account all the bids and computes the mar-ket clearing price for each of the 24 hours. Then,consumer/producer bids larger/lower or equal thanthe market clearing prices are approved, and a con-tract is established.

A useful forecaster of the day ahead marketshould thus be able to predict the set of 24 marketclearing prices of day d based on the informationavailable before the deadline of day d− 1.

3.2. Deep Learning and DNNs

During the last decade, the field of neural net-works has gone trough some major innovations thathave lead to what nowadays is known as deep learn-ing [30]. Specifically, the term deep refers to the factthat, thanks to the novel developments of recentyears, we can now train different neural networkconfigurations whose depth is not just limited to asingle hidden layer (as in the traditional multilayerperceptron), and which have systemically showedbetter generalization capabilities [30].

While there are different DNN architectures,e.g. convolutional networks or recurrent networks,in this paper we consider a standard DNN, i.e. amultilayer perceptron with more than a single hid-den layer.

3.2.1. Representation

Defining by X = [x1, . . . , xn]⊤ ∈ Rn the input

of the network, by Y = [y1, y2, . . . , ym]⊤ ∈ Rm the

output of the network, by nk the number of neuronsof the kth hidden layer, and by zk = [zk1, . . . , zknk

]⊤

the state vector in the kth hidden layer, a generalDNN with two hidden layers can be represented asin Figure 1.

x1

x2

...

xn

z11

z12

...

z1n1

z21

z22

...

z2n2

y1

y2

...

ym

Hiddenlayer

Hiddenlayer

Inputlayer

Outputlayer

...

Figure 1: Example of a DNN.

4

In this representation, the parameters of themodel are represented by the set of weights W thatestablish the mapping connections between the dif-ferent neurons of the network [30].

3.2.2. Training

The process of estimating the model weights W isusually called training. In particular, given a train-

ing set ST ={

(Xk,Yk)}N

k=1with N data points,

the network training is done by solving a generaloptimization problem with the following structure:

minimizeW

N∑

k=1

gk

(

Yk, F (Xk,W))

, (1)

where F : Rn → Rm is the neural network map, and

gk is the problem-specific cost function, e.g. the Eu-clidean norm or the average cross-entropy. Tradi-tional methods to solve (1) include gradient descentor the Levenberg–Marquardt algorithm [1]. How-ever, while these methods work well for small sized-networks, they display computational and scalabil-ity issues for DNNs. In particular, better alterna-tives for DNNs are the stochastic gradient descentand all its variants [39].

It is important to note that (1) is an approxima-tion of the real problem we wish to minimize. Par-ticularly, in an ideal situation, we would minimizethe cost function w.r.t. to the underlying data dis-tribution; however, as the distribution is unknown,the problem has to be approximated by minimizingthe cost function over the finite training set. Thisis especially relevant for neural networks, where amodel could be overfitted and have a good perfor-mance in the training set, but perform badly inthe test set, i.e. a set with a different data dis-tribution. To avoid this situation, the network isusually trained in combination with regularizationtechniques, e.g. early stopping, and using out-of-sample data to evaluate the performance [30].

3.2.3. Network Hyperparameters

In addition to the weights, the network has sev-eral parameters that need to be selected before thetraining process. Typical parameters include thenumber of neurons of the hidden layers, the numberof hidden layers, the type of activation functions, orthe learning rate of the stochastic gradient descentmethod. To distinguish them from the main param-eters, i.e. the network weights, they are referred toas the network hyperparameters.

3.3. Hyperparameter Selection

In order to perform the selection of model hyper-parameters, papers in the field of electricity priceforecasting have traditionally defined a number ofconfigurations and chosen the one with the best per-formance [7, 14, 27, 32, 34]. Another approach, yetless usual, has been the use of evolutionary opti-mization algorithms in order to select the best net-work configuration [40]. However, while these ap-proaches might work under some conditions, theyhave some flaws. In particular, while the firstmethod implements fast decision-making, it doesnot provide an optimal selection of hyperparame-ters. Similarly, while the second method optimizesthe selection, it evaluates a very large number ofpoints in the hyperparameter space. As a result,if the function to be evaluated is costly, e.g. whentraining a DNN, the second method requires a largecomputation time.

An alternative to tackle these issues is Bayesianoptimization [41], a family of algorithms for op-timizing black-box functions that require a lowernumber of function evaluations than evolutionaryoptimization techniques. In particular, their work-ing principle is to sequentially evaluate new samplesin the function space, drawing new samples by usingthe information obtained in the previously exploredsamples as a prior belief. Based on that, they re-duce the number of evaluated sample points andlead to a more efficient optimization.

3.3.1. Hyperparameter Optimization

We consider a Bayesian optimization algorithmthat has been widely used in the machine learn-ing community. In particular, we use the Tree-Structured Parzen Estimator (TPE) [42], an op-timization algorithm within the family of sequen-tial model-based optimization methods [43]. Thebasic principle of a sequential model-based opti-mization algorithm is to optimize a black-box func-tion, e.g. the performance of a neural network asa function of the hyperparameters, by iterativelyestimating an approximation of the function andexploring the function space using the local min-ima of the approximation. At any given iterationi, the algorithm evaluates the black-box functionat a new point θi. Next, it estimates an approxi-mation M of the black-box function by fitting thepreviously sampled points to the obtained functionevaluations. Then, it selects the next sample pointθi+1 by numerically optimizing M and starts the

5

next iteration. Finally, after a maximum number ofiterations T have been performed, the algorithm se-lects the best configuration. Algorithm 1 representsan example of a sequential model-based optimiza-tion algorithm for hyperparameter selection.

Algorithm 1 Hyperparameter Optimization

1: procedure SMBO(T,θ0)2: θi ← θ0

3: H ← ∅4: for i = 1, . . . , T do

5: pi ← TrainNetwork(θi)6: H ← H ∪

{

(pi,θi)}

7: if i < T then

8: Mi(θ)← EstimateModel(H)9: θi ← argmaxθ Mi(θ)

10: end if

11: end for

12: θ∗ ← BestHyperparameters(H)

13: return θ∗

14: end procedure

3.3.2. Hyperparameter Analysis

An optional step after hyperparameter optimiza-tion is to perform an analysis of the hyperparam-eter importance. In particular, while the optimalhyperparameter configuration has been already ob-tained, it is unknown how much each hyperparam-eter contributes to the overall performance. Inves-tigating this is specially relevant in order to avoidunnecessary model complexities; e.g. while the op-timal number of neurons might be large, reducingthe number of neurons might barely affect the per-formance.

Functional ANOVA

An approach for carrying on such an analysis is pro-posed in [44], where a novel method based on ran-dom forests and functional ANOVA is introduced.In particular, [44] considers the generic case of hav-ing z hyperparameters with domains Θ1, . . . ,Θz,and defines the following concepts:

• Hyperparameter set Z = {1, . . . , z}.

• Hyperparameter space Θ : Θ1 × . . .×Θz.

• Hyperparameter instantiation θ = [θ1, . . . ,θz]⊤.

• Hyperparameter subset U = {u1, . . . , uq} ⊆ Zand associated partial hyperparameter instan-tiation θU = [θu1

, . . . , θuq]⊤.

Then, given a set H ={

(θk, pk)}T

k=1of hyper-

parameter realizations, the proposed method fits arandom forest model MRF(θ) to build a predictorof the performance p as a function of the hyperpa-rameter vector θ.

Then, usingMRF, the method defines a marginalperformance predictor a(θU ) as a forecaster of theperformance of any partial hyperparameter instan-tiation θU . In particular, given a subset U ⊆ Z,a(θU ) provides an estimation of the average perfor-mance across the hyperparameter space Z \U whenthe hyperparameters of U are fixed at θU .

Finally, using the marginal performance predic-tor a(θU ), the algorithm carries out a functionalANOVA analysis to estimate the importance ofeach hyperparameter. Particularly, defining the to-tal variance across the performance by V, the algo-rithm partitions V as a sum of individual variancecontributions of subsets U ⊆ Z to V :

V =∑

U⊆Z

VU , (2)

where VU is the contribution of subset U to thetotal variance. Then, the importance FU of eachsubset U is computed based on the subset contri-bution to the total performance variance:

FU =VU

V. (3)

For the particular case of the hyperparameter im-portance, the algorithm just evaluates FU for eachsubset U = {i} composed of a single hyperparam-eter. As in [44], we refer to the variance contribu-tions FU of single hyperparameters as main effectsand to the rest as interaction effects.

It is important to note that, in addition to the im-portance FU , the algorithm also provides, for eachpartial hyperparameter instantiation θU , the pre-diction of the marginal performance a(θU ) and anestimation of its standard deviation σθU

.

3.4. Performance Metrics

In order to evaluate the accuracy of the pro-posed models, we need a performance metric. Inthis paper, as motivated below, we use the symmet-ric mean absolute percentage error (sMAPE) [45].Given a vector Y = [y1, . . . , yN ]⊤ of real outputs

6

and a vector Y = [y1, . . . , yN ]⊤ of predicted out-puts, the sMAPE metric can be computed as:

sMAPE =100

N

N∑

k=1

|yk − yk|

(|yk|+ |yk|)/2. (4)

The reason for selecting the sMAPE instead of themore traditional MAPE is the fact that the MAPEis affected by different issues [45]. Particularly, forour application, the MAPE becomes sensitive tovalues close to zero. When an output yi gets closeto zero, the corresponding MAPE contribution be-comes very large and it dominates the final value.

3.5. Diebold-Mariano (DM) Test

The sMAPE metric defined above only providesan assessment of which model has, for the datause, a better accuracy. While the accuracy of amodel can be higher, the difference in performancemight be not significant enough to establish thatthe model is really better. To assess the statisticalsignificance in the difference of predictive accuracyperformance, a commonly used tool is the Diebold-Mariano test [46].

Given a time series vector Y = [y1, . . . , yN ]⊤

to be forecasted, two prediction models M1 andM2, and the associated forecasting errors ε

M1 =[εM1

1 , . . . , εM1

N ]⊤ and εM2 = [εM2

1 , . . . , εM2

N ]⊤, theDM test evaluates whether there is a significant dif-ference in performance accuracy based on an errorloss function L(εMi

k ). In particular, the DM testbuilds a loss differential function as:

dM1,M2

k = L(εM1

k )− L(εM2

k ), (5)

and then, it tests the null hypothesis H0 of bothmodels having equal accuracy, i.e. equal expectedloss, against the alternative hypothesis H1 of themodels having different accuracy, i.e.:

Two-sidedDM test

{

H0 : E(dM1,M2

k ) = 0,

H1 : E(dM1,M2

k ) 6= 0,(6)

with E representing the expected value. Similar tothe standard two-sided test, a one-sided DM testcan be built by testing the null hypothesis that theaccuracy of M1 is equal or worse than the accu-racy of M2 versus the alternative hypothesis of theaccuracy of M1 being better:

One-sidedDM test

{

H0 : E(dM1,M2

k ) ≥ 0,

H1 : E(dM1,M2

k ) < 0.(7)

While the loss function L can be freely chosen, ithas to ensure that the resulting loss differential iscovariance stationary. A loss function that is typi-cally used is:

L(εMi

k ) = |εMi

k |p, (8)

where usually p ∈ {1, 2}.

4. Data

In this section, the data used for the research isintroduced.

4.1. Data Selection and Motivation

In general, when looking at the day-ahead fore-casting literature, many inputs have been proposedas meaningful explanatory variables, e.g. tempera-ture, gas and coal prices, grid load, available gen-eration, or weather [1].

To make our selection, we try to make sure thatthe selected data is not only related to the pricedynamics, but also fulfills some minimum require-ments. More specifically, we only choose data thatis freely available for most European markets sothat the proposed models can easily be exported toother EU markets. Moreover, we ensure that thedata represents market integration, i.e. that comesfrom two connected markets. In particular, we se-lect the period from 01/01/2010 to 31/11/2016 asthe time range of study, and we consider the follow-ing data:

1. Day-ahead prices from the EPEX-Belgium andEPEX-France power exchanges. They are re-spectively denoted as pB and pF.

2. Day-ahead forecasts of the grid load and gen-eration capacity in Belgium and France. Likein other European markets, these forecasts areavailable before the bid deadline on the websiteof the transmission system operators (TSOs):ELIA for Belgium and RTE for France. Theyare respectively denoted as lB and gB for Bel-gium, and as lF and gF for France.

3. Calendar of public holidays HF and HB inFrance and Belgium in the defined time range.

While it could be argued that different weatherdata could also be easily accessible and importantfor the forecasting, for our research, we have de-cided to disregard them for two main reasons:

7

1. Weather factors are already indirectly takeninto account in the grid load and generationforecasts provided by the TSO. In particular,the generation forecast has to consider weatherinformation regarding wind speed and solar ra-diation. Likewise, load forecasts also need toconsider temperature and other weather vari-ables to obtain the electricity consumption.

2. Weather data are local phenomena, and assuch, they can greatly vary from one part ofa country to another. As a result, unlike thegrid load or generation data, it is not possibleto select a single value of the temperature orany other weather data for a given time inter-val.

4.2. Data Processing

It is important to note that the data used ismostly unprocessed. In particular, as we intendto forecast and detect spikes, price outliers arenot eliminated. The only data transformation is aprice interpolation and elimination every year cor-responding respectively to the missing and extravalues due to the daylight saving. In addition, whileall the metrics and tests are computed using thereal prices, the training of the neural networks isdone with data normalized to the interval [−1, 1].This last step is necessary because the input fea-tures have very different ranges; therefore, if thedata is not normalized, the training time increasesand the final result is a network that displays, ingeneral, worse performance [47].

4.3. Data Division

To perform the different experiments, we dividethe data into three sets:

1. Training set (01/01/2010 to 31/11/2014):These data are used for training and estimat-ing the different models.

2. Validation set (01/11/2014 to 31/11/2015): Ayear of data is used to conduct early-stoppingto ensure that the model does not overfit and toselect optimal hyperparameters and features.

3. Test set (01/11/2015 to 31/11/2016): A year ofdata, which is not used at any step during themodel estimation process, is employed as theout-of-sample dataset to compare and evaluatethe models.

4.4. Data Access

For the sake of reproducibility, we have only usedpublicly available data. In particular, the load andgeneration day-ahead forecasts are available on thewebpages of RTE [48] and Elia [49], the respectiveTSOs in France and Belgium. In the case of theprices, they can be obtained from the ENTSO-Etransparency platform [50].

5. Modeling Framework

In this section, two different models are proposedto include market integration in day-ahead forecast-ing. The two models are similar to each other asboth of them try to forecast the full set of day-ahead prices. However, they differ from each otherin the number and type of prices that they pre-dict; in particular, while the first model predictsthe day-ahead prices of a single market, the sec-ond model combines a dual market prediction intoa single model.

5.1. Single-Market Day-Ahead Forecaster

The basic model for predicting day-ahead pricesuses a DNN in order to forecast the set of 24 day-ahead prices.

5.1.1. Conceptual Idea

Based on the results of [20], we select a DNNwith two hidden layers as forecasting model. Defin-ing the input of the model as the relevant dataX = [x1, . . . , xn]⊤ ∈ R

n available at day d − 1in the local and neighboring markets, and let-ting n1 and n2 be the number of neurons of thefirst and the second hidden layer respectively, andp = [p1, p2, . . . , p24]⊤ ∈ R

24 the set of 24 day-aheadprices to be forecasted, the proposed model can berepresented as in Figure 2.

5.1.2. Model Parameters

The parameters of the DNN are represented bythe set of weights that establish the mapping con-nections between the different neurons of the net-work:

• Wi,i: the vector of weights between the inputX and the neuron i of the first hidden layer.

• Wh,i: the vector of weights between the firsthidden layer and the neuron i of the secondhidden layer.

8

x1

x2

...

xn

z11

z12

z13

...

z1n1

z21

z22

z23

...

z2n2

p1

p2

...

p24

Hiddenlayer

HiddenlayerInput

layerOutput

layer

...

Figure 2: DNN to forecast day-ahead prices.

• Wo,i: the vector of weights between the secondhidden layer and the output price vector p.

• bk = [bk1, . . . , bknk]⊤: the vector of bias

weights in the kth hidden layer, with k = 1, 2.

• bo = [bo,1 . . . , bo,24]⊤: the vector of biasweights in the output layer.

5.1.3. Model Equations

Using the above definitions, the equations of theDNN can be defined as:

z1i = f1i

(

W⊤i,i ·X + b1i

)

, for i = 1, . . . n1, (9a)

z2i = f2i

(

W⊤h,i · z1 + b2i

)

, for i = 1, . . . n2, (9b)

pi = fo,i

(

W⊤o,i · z2 + bo,i

)

, for i = 1, . . . 24, (9c)

where f1i and f2i respectively represent the acti-vation function of neuron i in the first and secondhidden layer, and where fo,i is the activation func-tion of neuron i in the output layer.

5.1.4. Network Structure

The rectified linear unit [51] is selected as theactivation function of the two hidden layers. How-ever, as the prices are real numbers, no activationfunction is used for the output layer.

To select the dimension n of the network inputand the dimensions n1 and n2 of the hidden layers, afeature selection and hyperparameter optimizationare performed.

5.1.5. Training

The DNN is trained by minimizing the meanabsolute error. In particular, given the training

set ST ={

(Xk,pk)}N

k=1, the optimization prob-

lem that is solved to train the neural network is:

minimizeW

N∑

k=1

‖pk − F (Xk,W)‖1, (10)

where F : Rn → R

24 is the neural network map.The selection of the mean absolute error instead ofthe more traditional root mean square error is donefor a simple reason: as the electricity prices havevery large spikes, the Euclidean norm would puttoo much importance on the spiky prices.

The optimization problem is initialized via single-start with the Glorot initialization [52] and solvedusing Adam [53], a version of the stochastic gradi-ent descent method that computes adaptive learn-ing rates for each model parameter. Adam is se-lected for a clear reason: as the learning rate is au-tomatically computed, the time needed to tune thelearning rate is smaller in comparison with otheroptimization methods. Together with Adam, theforecaster also considers early stopping [54] to avoidoverfitting.

5.2. Dual Market Day-Ahead Forecaster

A possible variant of the single-market model isa forecaster that predicts the prices of two marketsin a single model. While this might seem counter-intuitive at first, i.e. adding extra outputs to themodel could compromise its ability to forecast theset of 24 prices that we are really interested in, thisapproach can, in fact, lead to neural networks thatare able to generalize better.

5.2.1. Conceptual Idea

The general idea behind forecasting two marketstogether is that, as we expect prices in both mar-kets to be interrelated and to have similar dynam-ics, by forecasting both time series in a single modelwe expect the neural network to learn more accu-rate relations. In particular, it has been empiricallyshown that DNNs can learn features that can, tosome extent, generalize across tasks [55]. Similarly,it has also been shown that, by forcing DNNs tolearn auxiliary related tasks, the performance andlearning speed can be improved [56, 57].

There are some possible hypotheses that can ex-plain why training with multiple outputs can helpto improve the performance:

9

1. The simplest explanation is the amount ofdata: as more data is available, the neural net-work can learn more relevant features. More-over, as the tasks are related, the neural net-work has more data to learn features that arecommon to all tasks.

2. A second reason is regularization: By solvingdifferent tasks, the network is forced to learnfeatures useful for all tasks and to not overfitto the data of a single task.

5.2.2. Model Implementation

Consider an electricity market B and a sec-ond electricity market F that is connected to B.Then, defining the output of the network by p =[pB1

, . . . , pB24, pF1

, . . . , pF24]⊤ ∈ R

48, i.e. the set of48 day-ahead prices from markets B and F, andkeeping the rest of the DNN parameter definitionsthe same, the new DNN structure can be repre-sented as in Figure 3. In addition, as both mod-els only differ in the output size, the implementa-tion details are exactly the same as defined for thesingle-market model in Section 5.1.5.

x1

x2

x3

x4

...

xn

z11

z12

z13

z14

z15

...

z1n1

z21

z22

z23

z24

z25

...

z2n2

pB1

...

pB24

pF1

...

pF24

Hiddenlayer

HiddenlayerInput

layerOutput

layer

...

Figure 3: DNN to simultaneously forecast day-ahead pricesin two markets.

6. Feature Selection Algorithm

As explained in the introduction, while the fea-ture selection methods for electricity price forecast-ing proposed in the literature provide good and fastalgorithms, they have two drawbacks:

1. They perform a filter step where the model per-formance is not considered.

2. For the nonlinear methods, the different inputshave to be transformed, i.e. the selection is notdone over the original feature set, and thus,some feature information might be lost.

Therefore, we propose a nonlinear wrappermethod that directly evaluates the features on theprediction model; in particular, while the approachis more computationally demanding, it can providea better selection as it uses the real predictive per-formance without any data transformations.

6.1. Algorithm Definition

In Section 3.3 we have introduced the TPE algo-rithm, a method for hyperparameter optimization,together with functional ANOVA, an approach forassessing hyperparameter importance. In this sec-tion, we combine both methods to build a featureselection algorithm that consists of four steps:

1. Model the features as hyperparameters.

2. Optimize the hyperparameters/features.

3. Analyze the results.

4. Select the important features.

6.1.1. Features as Hyperparameters

The first step of the algorithm is to model theselection of features as model hyperparameters. Inparticular, we consider two types of features:

1. Binary features θB , whose selection can bedone through a binary variable, i.e. θB ∈{0, 1}, where θB = 0 would represent featureexclusion and θB = 1 feature inclusion. Binaryfeatures represent the type of features consid-ered by traditional algorithms. An examplewould be whether to include holidays data orwhether to select a specific lag in an ARIMAmodel.

2. Integer features θI , which not only can modelthe inclusion-exclusion of an input, but also se-lect some associated size or length, i.e. θI ∈ Z,where θI = 0 represents exclusion. Exampleswould be the number of past days of price dataor the maximum lag of an ARIMA model.

Given these definitions, the binary features aremodeled as hyperparameters using the hyperpa-rameter space ΘB and the hyperparameter setB = {1, . . . , nB}. Likewise, the integer featuresare modeled by the hyperparameter space ΘI and

10

the hyperparameter set I = {nB + 1, . . . , nB +nI}.Finally, the full hyperparameter space is definedby Θ = ΘB ∪ΘI and the hyperparameter set byZ = B ∪ I.

6.1.2. Feature Optimization

The second step of the algorithm is to perform aTPE optimization over the hyperparameter-featurespace. The result of the algorithm is the opti-mal feature selection θ

∗ together with the set H ={

(θk, pk)}T

k=1of feature-performance pairs, where

pk represents the model predictive accuracy whenusing the feature selection θk.

The fact that a feature is part of θ∗, does not

guarantee that the feature is relevant; specifically,a feature might have little or no effect in the per-formance, and still, as long as it does not have anegative effect, it might appear in the optimal con-figuration. As a result, if no further processing isconsidered, the algorithm might select redundantfeatures, and in turn, lead to more computationallyexpensive models and increase the risk of overfit.

6.1.3. Feature Analysis

To solve the problem of detecting unnecessaryfeatures, the algorithm comprises a third step wherefeature importance is analyzed. In particular, us-ing the functional ANOVA methodology proposedin [44], the algorithm analyzes H and provides theimportance of each feature i and each pairwise in-teraction {i, j} as the percentage-wise contributionto the performance variance V. Using the defini-tions given in Section 3.3.2 and (2)-(3), the algo-rithm computes the importance of feature Θi andeach pairwise interaction Θi ×Θj by:

F{i} =V{i}

V, F{i,j} =

V{i,j}

V. (11)

In addition, for each feature i ∈ Z and featureinstantiation θi ∈ Θi, the algorithm also providesthe predicted marginal performance a(θi).

6.1.4. Feature Selection

The fourth and final algorithm step is the selec-tion itself. In particular, making use of the obtainedF{i}, F{i,j} and a(θi), the selection procedure per-forms the following steps:

1. Define a threshold parameter ǫ ∈ (0, 1].

2. Make a pre-selection by discarding featuresthat do not improve nor decrease the perfor-mance. In particular, regard features i whoseimportance F{i} is larger than ǫ:

U∗1 = {i ∈ Z | F{i} > ǫ}, (12a)

or features i that have at least one pairwisecontribution F{i,j} larger than ǫ:

U∗2 = {i ∈ Z | ∃ j ∈ Z \ {i} : F{i,j} > ǫ}.

(12b)

3. With the remaining features in U∗1 ∪ U∗

2 ,perform a second selection U∗ by discardingthose features whose predicted marginal per-formance a(θi) is lower when being includedthan when being excluded, i.e.:

U∗ = {i ∈ U∗1 ∪U

∗2 | ∃ θi ∈ Θi : µθi,0 < a(θi)},

(12c)where µθi,0 represents the marginal perfor-mance a(θi = 0) of excluding feature i.

4. Finally, the set of selected binary features canbe obtained by:

U∗B = U∗ ∩B. (12d)

Similarly, for the set of optimal integer featuresU∗I , the selection is done in terms of the fea-

ture itself and the instantiation with the bestperformance:

U∗I =

{

{i, θ∗i } | i ∈ U∗ ∩ I, θ∗i = argmaxθi

a(θi)}

.

(12e)

6.2. Case Study

To evaluate the proposed algorithm, we use it toselect the features for predicting Belgian prices andto obtain a first assessment of the effect of mar-ket integration, i.e. the effect of French features inforecasting Belgian prices. To perform the analysis,we consider the first and simpler DNN proposed inSection 5.

6.2.1. Feature Definition

In order to perform the feature selection, we firstneed to model each possible input as either a bi-nary or an integer feature. As described in Section4, the available features are the day ahead pricespB and pF, the day-ahead forecasts lB and lF ofthe grid load, the day-ahead forecasts gB and gF ofthe available generation, and the calendar of publicholidays HB and HF.

11

Considering that, given the market at timeh, we aim at forecasting the time series vectorpBh

= [pBh+1, . . . , pBh+24

]⊤ of Belgian day-aheadprices, the use of the day-ahead loads lBh

=[lBh+1

, . . . , lBh+24]⊤ and lFh

= [lFh+1, . . . , lFh+24

]⊤,and the use of the day-ahead capacity gener-ations gBh

= [gBh+1, . . . , gBh+24

]⊤ and gFh=

[gFh+1, . . . , gFh+24

]⊤, should be modeled as binaryfeatures θlB , θlF , θgB , and θgF .

Similarly, for the public holidays, the features canalso be modeled as binary variables θHB

and θHF.

In particular, as the set of 24 hours of a day iseither a holiday or not, the holidays are defined asmodel inputs XHB

, XHF∈ {0, 1}, with 0 and 1

representing respectively no holiday and holiday.To model the Belgian prices, we need to use an

integer feature to select the number of the consid-ered past values. In particular, as the prices displaydaily and weekly seasonality, we have to use twointeger features: θpB,d

∈ {1, 2, . . . , 6} as the fea-ture modeling the number of past days during thelast week (daily seasonality) and θpB,w

∈ {1, 2, 3} asthe feature modeling the number of days at weeklylags (weekly seasonality). Based on the selection ofθpB,d

and θpB,w, the considered EPEX-Belgium past

prices can be decomposed as the price inputs XdpB,h

at daily lags and the price inputs XwpB,h

at weeklylags:

XdpB,h

=[

pBh−i1, . . . , pBh−iNd

]⊤, (13a)

XwpB,h

=[

pBh−j1, . . . , pBh−jNw

]⊤, (13b)

where:

{i1, . . . , iNd} = {i | 0 ≤ i ≤ 24 · θpB,d

− 1} (13c)

{j1, . . . , jNw} = {j | 1 ≤ k ≤ θpB,w

, (13d)

k · 168 · θpB,d≤ j ≤ k · 192 · θpB,d

− 1}

It is important to note that, as this is the time seriesto be predicted, we disregard the cases where nodaily nor weekly seasonality is used, i.e. θpB,d

= 0or θpB,w

= 0.Finally, for the EPEX-France prices we could use

the same integer features as for EPEX-Belgium.However, for simplicity, we directly consider thesame lags for both time series and model the Frenchprices as a binary feature θpF

. It is important tonote that, despite having the same length, the se-lection of both time series is still independent; par-ticularly, the lags are only defined for Belgium, andthe French prices are just excluded or included. Themodeled input features are summarized in Table 1.

Feature Domain Definition

θpB,d {1, . . . , 6}Number of past days

for input price sequence

θpB,w {1, . . . , 3}Days at weekly lags

for input price sequence

θpF {0, 1} Day-ahead price in France

θlB {0, 1} Load in Belgium

θlF {0, 1} Load in France

θgB {0, 1} Generation in Belgium

θgF {0, 1} Generation in France

θHB{0, 1} Holiday in Belgium

θHF{0, 1} Holiday in France

Table 1: Definition of the modeled input features.

6.2.2. Hyperparameter Optimization

In order to guarantee that the network is adaptedaccording to the input size, we simultaneously op-timize the hyperparameters of the DNN, i.e. thenumber of neurons n1 and n2. In particular, as thefeature selection method is based on a hyperparam-eter optimization, we directly include the numberof neurons as integer hyperparameters that are op-timized together with the features. We set the do-main of n1 as the set of integers {100, 101, . . . , 400}and the one of n2 as {0} ∪ {48, 49, . . . , 360}, wheren2 = 0 represents removing the second hidden layerand using a network of depth one.

6.2.3. Experimental Setup

In order to use the proposed algorithm, we firstneed to define the threshold ǫ for the minimum vari-ance contribution; in our case, we select ǫ = 0.5 %.In addition, we also need to select the maximumnumber of iterations T of the TPE algorithm; wefound T = 1000 to offer a good trade-off betweenperformance and accuracy. Particularly, consider-ing that training a single model takes 2 min, the fullfeature selection requires 30 h. While this mightseem a long time, this step is only performed af-ter some periodic time, e.g. a month, to reassessfeature dependencies; therefore, the proposed ap-proach and settings yield a feasible and accuratemethod for the time scale of day-ahead prices.

For implementing the functional analysis of vari-ance, we use the python library fANOVA developedby the authors of [44]. Likewise, for implementingthe TPE algorithm, we use the python library hy-peropt [58].

12

6.2.4. Results

In a first conducted experiment, we obtained anunexpected result: inclusion/exclusion of the gen-eration capacity in Belgium gB accounts for roughly75% of the performance variance V, with inclusionof gB dramatically decreasing the predictive accu-racy. Since the generation capacity has been suc-cessfully used by other authors as a market driver[1], this result requires some explanation. From Fig-ure 4, which displays the time series of gB, we cancomprehend the result: right before the transitionfrom the training to the validation set, the averagegB suffers a major change and drops from approx-imately 14 GW to 9 GW. Because of the drasticdrop, it is likely that some relations that are learnedbased on the training set, do not hold in the vali-dation set, and that as a result, the predictive per-formance in the validation set worsens when gB isconsidered.

2010 2011 2012 2013 2014 2015 2016

8

10

12

14

16

Validation

Set

Training

Set

Year

gB

[GW

]

Figure 4: Generation in Belgium in the considered period.

This regime change in gB violates the assump-tion that conditions in the training, validation, andtest sets are equal. Therefore, to perform a correctfeature selection and to guarantee that the threedatasets hold similar conditions, the experimentalsetup should disregard θgB . It is important to notethat, before taking this decision, we have consid-ered shuffling the data to ensure homogeneous con-ditions between the three sets. However, this alter-native was avoided for two reasons:

1. As the output prices in some samples are theinput features in others, data has to be dis-carded in order to avoid data contaminationbetween the three sets. As a result, since thelarger the dataset the better the DNN can gen-eralize, this implementation could potentially

decrease the predictive accuracy of the model.

2. Since the end goal of the model is to forecastrecent prices, it is meaningless to try to modelan input-output relation that no longer holds.

Considering these facts, a correct feature selec-tion is performed without θgB . As depicted in Ta-ble 2, the first result to be noted from the new ex-perimental results is that, as gB is a big source oferror, the variance V of the sMAPE performance isreduced by a factor of 5. In addition, as it could

V

Feature selection with gB 0.58 %2

Feature selection without gB 0.12 %2

Table 2: Performance variance with and without gB.

be expected, the results obtained in this new ex-periment display a more distributed contributionamong the different features. In particular, in thefirst experiment, gB was responsible for 75% of theperformance variance. Now, as depicted in Table3, French prices and load account for roughly 50 %of the total performance variance, and the availablegeneration in France, the load in Belgium, and thenumber of past days play a minor role.

Contribution to V

All main effects 64.9%

French load 28.4%

French prices 25.7%

French generation 4.78%

Belgium load 1.0%

Past days number 0.8%

Table 3: Variance contribution of single features for the sec-ond feature selection experiment.

Based on the above results, we can make a firstselection and remove from the set of possible inputsthe public holidays θHB

and θHFas both seem not

to be decisive. Similarly, we can select θpB,w= 1 as

the number of days at weekly lags seems to be non-critical. Finally, to complete the feature selection,we should use the marginal performances of the fiveimportant features represented in Figure 5; basedon them, it is clear that we should select the price,load and generation in France, discard the grid loadin Belgium, and use two days of past price data.

13

Exclusion Inclusion15

15.2

15.4

15.6

15.8

16

θlF

sMAPE

[%]

(a) French grid load.


15.2

15.4

15.6

15.8

16

θpF

sMAPE

[%]

(b) French prices.


15.2

15.4

15.6

15.8

16

θgF

sMAPE

[%]

(c) French generation capacity.


15.2

15.4

15.6

15.8

16

θlB

sMAPE

[%]

(d) Belgian grid load.

1 2 3 4 5 615

15.2

15.4

15.6

15.8

16

θpB,d

sMAPE

[%]

(e) Number of days of past prices in the last week.

Figure 5: Marginal performance on the validation set of thefive most important features.

Together with the features, we have also opti-mized the hyperparameters of the model. The re-sults show that the suitable numbers of neurons aren2 = 200 and n1 = 320.

6.3. Discussion

Based on the results of the feature selection algo-rithm, we should include the following features asmodel inputs:

1. Day-ahead load and generation in France:

2. Last two days of Belgian and French prices:

3. Belgian and French prices a week before:

In addition, while it seems that the differentFrench market features, i.e. market integration fea-tures, play a large role in the forecasting accuracy,

the results are only enough to have a general ideaof the importance of French data; particularly, astatistical analysis is required before making anyfurther conclusion.

Finally, while we have used the proposed algo-rithm to select the input features, we have not yetprovided an evaluation of its accuracy. In par-ticular, to assess its performance, we could com-pare models using only optimally selected featuresagainst models using also features that have beendiscarded; more specifically, we could evaluate thedifference in predictive accuracy by means of hy-pothesis testing (see Section 7.2.4).

7. Evaluation of Market Integration and

Modeling Framework

The analysis provided by the feature selection al-gorithm is based on the validation set; while thisdataset is not used for training the network, it isemployed for early stopping and hyperparameteroptimization. Therefore, to have a fully fair and un-biased evaluation, we need an extra comparison us-ing unseen data to the full training process. More-over, as the feature selection results were obtainedusing the first proposed model, results for the sec-ond model are also required. Finally, to have ameaningful assessment, the statistical significanceof the results should be computed. To fulfill therequirements, the goal of this section is twofold:

1. Provide statistical significance of the improve-ments of using French market data, i.e. marketintegration, by performing a DM test on theout-of-sample data represented by the test set.

2. Based on the same statistical test, demonstratehow a dual-market forecaster can provide sig-nificant improvements in predictive accuracy.

7.1. Diebold-Mariano Test

To assess the statistical significance in the dif-ference of predictive accuracy, we use the DM testas defined by (5)-(8). Since the neural network istrained using the absolute mean error, we choose touse also the absolute error to build the loss differ-ential:

dM1,M2

k = |εM1

k | − |εM2

k |. (14)

In addition, we follow the same procedure as in[26] and we perform an independent DM test foreach of the 24 time series representing the differ-ent hours of a day. The reason for this is that, as

14

we use the same information to forecast the set of24 prices, the forecast errors within the same daywould exhibit a high correlation. Moreover, to havean assessment of the whole error sequence, we alsoperform the DM test considering serial correlationof order k in the error sequence. Particularly, re-calling that optimal k-step-ahead forecast errors areat most (k − 1)-dependent [46], we perform a DMtest on the full loss differential considering serialcorrelation of order 23.

In the various experimental setups of this casestudy, we employ the one-sided DM test given by(7) at the 95% confidence level. This selection isdone because we want to assess whether the perfor-mance of a forecaster A is statistically significantlybetter than a forecaster B, not whether the perfor-mances of forecasters A and B are significantly dif-ferent (like it would be the case in the two-sided DMtest). In more detail, for each hour h = 1, . . . , 24 ofthe day, we test the null hypothesis of a model M1

that uses French data having the same or worse ac-curacy than a model M2 that uses no French data.More specifically, we perform the following tests:

{

H0 : E(dM1,M2

hk) ≥ 0,

H1 : E(dM1,M2

hk) < 0,

for h = 1, . . . 24, (15)

where [dh1, . . . , dhN/24

]⊤ represents the vector se-quence of loss differentials of hour h. In addition,we perform the same test but considering the fullloss differential sequence and assuming serial corre-lation:

{

H0 : E(dM1,M2

k ) ≥ 0,

H1 : E(dM1,M2

k ) < 0.(16)

7.2. French Market Data: Statistical Significance

In Section 6.2-6.3, we have showed that usingmarket data from connected markets can help toimprove the performance. In this section, we ex-tend the analysis by directly comparing a modelthat includes this type of data against a model thatexcludes it, and then, performing a DM test to an-alyze the statistical significance.

7.2.1. Experimental Setup

The model used to perform the evaluation isthe single-market forecaster employed for the fea-ture selection. In particular, based on the ob-tained hyperparameter results, we select n1 = 320

and n2 = 200; similarly, considering the optimizedprices lags obtained in the feature selection, we con-sider, as input sequence for the model, the Belgiumprices during the last two days and a week before.Then, we discard as input features the capacity gen-eration in Belgium as well as the holidays in bothcountries. Then, in order to compare the effect ofFrench data, we consider the remaining features aspossible inputs for the model, i.e. we compare thefirst model excluding all the French data and onlyconsidering Belgian prices with respect to the sec-ond model including the French data. We respec-tively refer to these two models as MNoFR and MFR.

In addition, while the load in Belgium lB appearsto be non-relevant, we decided to repeat the previ-ous experiment but including lB in both models.The reason for this is twofold:

1. By adding the Belgian load, we ensure that thegood results of using French data are not dueto the fact that the model does not includespecific Belgian regressors.

2. Furthermore, with this experiment, we can alsovalidate the results of the feature selection al-gorithm. In particular, as the load does notseem to play a big role, we expect the per-formance difference between models with andwithout lB to be insignificant.

Similar as before, we refer to these models byMNoFR,lB and MFR,lB .

7.2.2. Case 1: Models Without lBIn this experiment, we compare MNoFR against

MFR by evaluating their performance on the year ofyet unused data represented by the test set. As in areal-world application, to account for the last avail-able information, the two models are re-estimatedafter a number days/weeks. In our application, con-sidering that a model takes around 2 minutes to betrained on the GPU, we decide to re-estimate themusing the smallest possible period of a day.

A first comparison of the models is listed in Table4 by means of sMAPE. From this first evaluation,we can see that including the French data seems toreally enhance the performance of the forecaster.

Model MNoFR MFR

sMAPE 16.0% 13.2%

Table 4: Performance comparison between MNoFR and MFR

in the out-of-sample data in terms of sMAPE.

15

To provide statistical significance to the above re-sult, we perform a DM test as described in Section7.1. The obtained results are depicted in Figure 6,where the test statistic is represented for each of the24 hours of a day and where the points above thedashed line accept, with a 95 % confidence level, thealternative hypothesis of MFR having better perfor-mance accuracy. As we can see from the plot, theforecast improvements of the model MFR includingFrench data are statistically significant for each oneof the 24 day-ahead prices.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

5

Hour of the day

DM

statistic

Figure 6: DM test results when comparingMNoFR andMFR.Values above the dashed line reject the null hypothesis witha 95 % confidence level, and in turn, represent cases wherethe accuracy of MFR is significantly better.

When the DM test is performed on the full lossdifferential and taking into account serial correla-tion, the obtained metrics completely agree withthe results obtained for the individual 24 hours. Inparticular, the obtained p-value is 1.2 ·10−11, whichconfirms the strong statistical significance of usingthe French data in the prediction model.

7.2.3. Case 2: Models with lBUsing the same procedure, we compare MNoFR,lB

against MFR,lB . From Table 5 we can see how,as before, the model including French data outper-forms the alternative.

Model MNoFR,lB MFR,lB

sMAPE 15.7% 13.1%

Table 5: Performance comparison between MNoFR,lBand

MFR,lBin the out-of-sample data in terms of sMAPE.

To provide statistical significance to the obtainedaccuracy difference we again perform the DM tests.The obtained results are illustrated in Figure 7;as before, including French data leads to improve-ments in accuracy that are statistically significantfor the 24 predicted values. As before, when weconsider the DM test for the full loss differential

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

5

Hour of the day

DM

statistic

Figure 7: DM test results when comparing MNoFR,lBand

MFR,lB. Values above the dashed line reject the null hy-

pothesis at a 5% significance level, and in turn, representcases where the accuracy of MFR,lB

is significantly better.

with serial correlation, the p-value is 1.6 · 10−12, avalue that agrees with Figure 7 and confirms oncemore that the improvements of using French dataare statistically significant.

7.2.4. Accuracy of the Feature Selection

Using the results of the previous two sections,we can illustrate the accuracy of the proposed fea-ture selection algorithm in Section 6. In particu-lar, when performing the feature selection, we haveobserved that the contribution of the Belgian loadlB was rather insignificant and even slightly nega-tive; this led to discard lB as an input feature. Inthis section, to verify that the selection algorithmperformed the right choice, we perform DM teststo compare MNoFR,lB against MNoFR and MFR,lB

against MFR. In particular, we perform a two-sidedDM test per model pair with the null hypothesis ofthe models having equal accuracy.

For the sake of simplicity, we avoid depicting theDM test results for each individual hour; instead wedirectly illustrate the p-values of the DM test whenconsidering the whole loss differential sequence withserial correlation. As can be seen from Table 6,the obtained p-values for both tests are above 0.05,and as a result, the null hypothesis of equal accu-racy cannot be rejected, i.e. there is no statisticalevidence of the models using Belgian load havingdifferent accuracy than the models without it.

Model Pair p-value

MFR,lB vs MFR 0.435

MNoFR,lB vs MNoFR 0.275

Table 6: p-values for DM test with the null hypothesis ofmodels with lB having equal accuracy as models without it.

Based on the obtained results, it is clear that

16

using lB is not relevant, and thus, that the choiceperformed by the feature selection algorithm is cor-rect. In particular, while this experiment does notanalyze the performance of the feature selection onall the inputs, it does consider the most problem-atic feature. More specifically, as many researchershave successfully used the load as an explanatoryvariable [6–8, 27, 34] and as the load itself does notdisplay any regime change in the considered time in-terval, it is rather striking to see its minimal effecton the performance. Therefore, by demonstratingthat the algorithm is correct when discarding theload, we obtain an assessment of its general accu-racy, and we can conclude that the algorithm per-forms a correct feature selection.

7.3. Evaluation of a Dual-Market Forecaster

In this section, we evaluate the possible im-provements of using the dual-market forecasterand multi-tasking by comparing the single-marketmodel against the dual-market forecaster predictingthe day-ahead prices in Belgium and France. Themodels are denoted by MSingle and MDual and theyboth use the optimal features and hyperparametersobtained for the single-market model in Section 6.It is important to note that, while in an ideal exper-iment the hyperparameters of the dual-market fore-caster should be re-estimated, for simplicity we de-cided to directly use the hyperparameters obtainedfor the single-market forecaster.

The initial comparison is listed in Table 7. Fromthis first evaluation it seems that using dual-marketforecasts can improve the performance.

Model MSingle MDual

sMAPE 13.2% 12.5%

Table 7: Performance comparison between the single anddual-market forecasters in terms of sMAPE.

To provide statistical significance to these results,we again perform the DM test for each of the 24hours of a day. The obtained statistics are depictedin Figure 8; as before, the points above the upperdashed line accept, with a 95 % confidence level,the alternative hypothesis of MDual having a bet-ter performance accuracy. In addition, as not everyhourly forecast is statistically significant, we repre-sent in the same figure the alternative DM test withthe null hypothesis of MSingle having equal or loweraccuracy than MDual. This test is characterized by

the lower dashed line and any point below this lineaccepts, with a 95 % confidence level, that MSingle

has better performance accuracy.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24−5

0

5

Hour of the day

DM

statistic

Figure 8: DM test results when comparing MSingle andMDual. Values above the top dashed line represent caseswhere, with a 95 % confidence level, MDual is significantlybetter. Similarly, values below the lower dashed line ac-cept, with a 95 % confidence level, that MDual is significantlyworse.

As we can see from the plot, the forecast improve-ments of the dual-market forecaster are statisticallysignificant in 7 of the 24 day-ahead prices. In ad-dition, the single-market forecaster is not signifi-cantly better in any of the remaining 17 day-aheadprices. Therefore, as MDual is approximately betterfor a third of the day-ahead prices and not worse forthe remaining two-thirds, we can conclude that thedual-market forecaster is a statistically significantbetter forecaster.

Finally, we also perform the DM test on the fullloss differential considering serial correlation. Onceagain, the obtained metrics agree with the resultsobtained for the individual 24 hours: with a p-valueof 9.5 ·10−03, the test results confirm the statisticalsignificance of the difference in predictive accuracywhen using the dual-market forecaster.

7.4. Analysis and Discussion

To understand and explain the obtained results,we have to note that, as introduced in Section 2.4,market integration across European electricity mar-kets has been increasing over the years due to EUregulations. This highly nonlinear and complexeffect dramatically modifies the dynamics of elec-tricity prices and is behind the obtained improve-ments of our models. In particular, our forecastersuse this effect to outperform alternative techniquesthat have traditionally ignored it: the first fore-caster proposed, which models market integrationin the input space, obtains statistically significantimprovements w.r.t. to model counterparts that dis-regard market integration. The second proposedforecaster, which goes one step further by modeling

17

market integration in the output space, is shownto be crucial to obtain further significant improve-ments. For our case study, this translates to thefollowing conclusions:

1. Using features from the French market signif-icantly enhances the predictive accuracy of amodel forecasting Belgian prices. The resultsare statistically significant and independent ofwhether Belgian features are considered or not.

2. A dual-market forecaster simultaneously pre-dicting prices in France and Belgium can im-prove the predictive accuracy. In particular,by solving two related tasks, it is able to learnmore useful features, to better generalize theprice dynamics, and to obtain improvementsthat are statistically significant.

3. The proposed feature selection algorithm isable to perform a correct assessment of the im-portance of features.

In addition, it is interesting to see how explana-tory variables from the EPEX-Belgium, e.g. loadand generation, have almost no influence in the day-ahead prices. In fact, from the obtained results, itis surprising to observe how French factors play alarger role in Belgian prices than the local Belgianfeatures.

As a final discussion, it is necessary to indicatewhy, while being neighboring countries of Belgium,The Netherlands and Germany and their respec-tive markets have not been considered in the study.The reason for not considering The Netherlands isthe fact that the amount of available online datais smaller than in France and Belgium, and thus,training the DNNs can be harder. In the case ofGermany, the reason for not considering it is that,at the moment, there is not a direct interconnec-tion of the electrical grid between Belgium and Ger-many.

7.5. Practical Applications

As a last remark, it is important to point outthe different practical applications that these re-sults have. Particularly, there are two main obviousapplications where this research can be highly ben-eficial. The first and most important applicationis its usage by utility companies to increase theireconomic profit. More specifically, a 1% improve-ment in the MAPE of the forecasting accuracy re-sults in about 0.1%-0.35% cost reduction [59]. Fora medium-size utility company with a peak load of

5 GW, this translates to saving approximately $1.5million per year [60, 61].

In addition, improvements in forecasting accu-racy are key to have a stable electrical grid. Par-ticularly, as the integration of renewable energysources increases, so do the imbalances in the elec-trical grid due to mismatches between generationand consumption. To tackle this issue demand re-sponse methods [62–64] have been traditionally ap-plied. By accurate forecasting electricity prices it isalso possible to improve the situation. In particu-lar, prices are usually low (high) when generation islarger (lower) than consumption. Therefore, giventhe right forecasts, market agents have economicincentives to buy (sell) energy when prices are low(high), and in turn, to reduce the grid imbalances.Therefore, using accurate price forecasting, marketagents can steered and motivated so that grid im-balances are reduced.

8. Conclusions

We have analyzed how market integration canbe used to enhance the predictive accuracy of day-ahead price forecasting in electricity markets. Inparticular, we have proposed a first model that, byconsidering features from connected markets, im-proves the predictive performance. In addition, wehave proposed a dual-market forecaster that, bymultitasking and due to market integration, canfurther improve the predictive accuracy. As a casestudy, we have considered the electricity markets inBelgium and France. Then, we have showed how,considering market integration, the proposed fore-casters lead to improvements that are statisticallysignificant. Additionally, we have proposed a novelfeature selection algorithm and using the same casestudy, we have shown how the algorithm correctlyassesses feature importance.

In view of these results, it is clear that market in-tegration can play a large role in electricity prices.In particular, the influence of neighboring marketsseems to be important enough to build statisticallysignificant differences in terms of forecasting accu-racy. As a consequence, as the EU has implementedregulations to form an integrated EU market butthere is still little insight in the outcome of suchregulations, these results are important in terms ofpolicy making. In particular, the fact that marketintegration largely modifies the price dynamics be-tween Belgium and France is an indicator that theregulations that were put in place are working. As

18

a result, using the proposed methodology, policymakers can benefit from a general tool to evaluatethe market integration regulations in other EU re-gions.

In addition, these results are also of high impor-tance in terms of grid stability and economic profitof market agents. In particular, as the knowledgeof the dynamics of electricity prices increases, thegrid operator might be able to better prevent someof the grid imbalances characterized by large pricepeaks. The increased knowledge is also economi-cally beneficial for market agents: a 1 % improve-ment in MAPE accuracy translates to savings of$1.5 million per year for a medium-size utility com-pany.

As a first step to help policy markets, in futurework the performed experiments will be expandedto the other European markets.

Acknowledgment

This research has received funding from the Euro-pean Union’s Horizon 2020 research and innovationprogram under the Marie Sk lodowska-Curie grantagreement No 675318 (INCITE).

References

[1] R. Weron, Electricity price forecasting: A review ofthe state-of-the-art with a look into the future, Interna-tional Journal of Forecasting 30 (4) (2014) 1030–1081.doi:10.1016/j.ijforecast.2014.08.008.

[2] T. Jamasb, M. Pollitt, Electricity market reform in theEuropean union: review of progress toward liberaliza-tion & integration, The Energy Journal 26 (2005) 11–41. doi:10.5547/issn0195-6574-ej-vol26-nosi-2.

[3] R. Weron, A. Misiorek, Forecasting spot elec-tricity prices: A comparison of parametric andsemiparametric time series models, InternationalJournal of Forecasting 24 (4) (2008) 744–763.doi:10.1016/j.ijforecast.2008.08.004.

[4] J. Crespo Cuaresma, J. Hlouskova, S. Koss-meier, M. Obersteiner, Forecasting electricityspot-prices using linear univariate time-seriesmodels, Applied Energy 77 (1) (2004) 87–106.doi:10.1016/S0306-2619(03)00096-5.

[5] Z. Yang, L. Ce, L. Lian, Electricity price forecast-ing by a hybrid model, combining wavelet trans-form, ARMA and kernel-based extreme learning ma-chine methods, Applied Energy 190 (2017) 291–305.doi:10.1016/j.apenergy.2016.12.130.

[6] F. J. Nogales, J. Contreras, A. J. Conejo, R. Espınola,Forecasting next-day electricity prices by time seriesmodels, IEEE Transactions on Power Systems 17 (2)(2002) 342–348. doi:10.1109/MPER.2002.4312063.

[7] A. Cruz, A. Munoz, J. Zamora, R. Espınola,The effect of wind generation and weekday on

Spanish electricity spot price forecasting, ElectricPower Systems Research 81 (10) (2011) 1924–1935.doi:10.1016/j.epsr.2011.06.002.

[8] A. Misiorek, S. Trueck, R. Weron, Point and in-terval forecasting of spot electricity prices: Linearvs. non-linear time series models, Studies in Nonlin-ear Dynamics & Econometrics 10 (3) (2006) 1–36.doi:10.2202/1558-3708.1362.

[9] A. K. Diongue, D. Guegan, B. Vignal, Forecastingelectricity spot market prices with a k-factor GI-GARCH process, Applied Energy 86 (4) (2009) 505–510. doi:10.1016/j.apenergy.2008.07.005.

[10] A. Conejo, M. Plazas, R. Espinola, A. Molina,Day-ahead electricity price forecasting using thewavelet transform and ARIMA models, IEEE Trans-actions on Power Systems 20 (2) (2005) 1035–1042.doi:10.1109/TPWRS.2005.846054.

[11] Z. Tan, J. Zhang, J. Wang, J. Xu, Day-aheadelectricity price forecasting using wavelet trans-form combined with ARIMA and GARCH mod-els, Applied Energy 87 (11) (2010) 3606–3610.doi:10.1016/j.apenergy.2010.05.012.

[12] N. Amjady, M. Hemmati, Energy price forecasting -problems and proposals for such predictions, IEEEPower and Energy Magazine 4 (2) (2006) 20–29.doi:10.1109/MPAE.2006.1597990.

[13] B. Szkuta, L. Sanabria, T. Dillon, Electricity priceshort-term forecasting using artificial neural networks,IEEE Transactions on Power Systems 14 (3) (1999)851–857. doi:10.1109/59.780895.

[14] J. P. S. Catalao, S. J. P. S. Mariano, V. M. F. Mendes,L. A. F. M. Ferreira, Short-term electricity prices fore-casting in a competitive market: A neural network ap-proach, Electric Power Systems Research 77 (10) (2007)1297–1304. doi:10.1016/j.epsr.2006.09.022.

[15] L. Xiao, W. Shao, M. Yu, J. Ma, C. Jin, Research andapplication of a hybrid wavelet neural network modelwith the improved cuckoo search algorithm for electricalpower system forecasting, Applied Energy 198 (2017)203–222. doi:10.1016/j.apenergy.2017.04.039.

[16] D. Wang, H. Luo, O. Grunder, Y. Lin, H. Guo,Multi-step ahead electricity price forecasting usinga hybrid model based on two-layer decompositiontechnique and BP neural network optimized by fire-fly algorithm, Applied Energy 190 (2017) 390–407.doi:10.1016/j.apenergy.2016.12.134.

[17] S. Fan, C. Mao, L. Chen, Next-day electricity-priceforecasting using a hybrid network, IET Generation,Transmission & Distribution 1 (1) (2007) 176–182.doi:10.1049/iet-gtd:20060006.

[18] W.-M. Lin, H.-J. Gow, M.-T. Tsai, An enhanced radialbasis function network for short-term electricity priceforecasting, Applied Energy 87 (10) (2010) 3226–3234.doi:10.1016/j.apenergy.2010.04.006.

[19] N. Amjady, Day-ahead price forecasting of electric-ity markets by a new fuzzy neural network, IEEETransactions on Power Systems 21 (2) (2006) 887–896.doi:10.1109/tpwrs.2006.873409.

[20] J. Lago, F. De Ridder, B. De Schutter, Forecasting spotelectricity prices: deep learning approaches and empir-ical comparison of traditional algorithms, Applied En-ergy (Submitted).

[21] L. Meeus, R. Belmans, Electricity market integrationin Europe, in: Proceedings of the 16th Power SystemsComputation Conference, 2008.

19

http://dx.doi.org/10.1016/j.ijforecast.2014.08.008

http://dx.doi.org/10.5547/issn0195-6574-ej-vol26-nosi-2

http://dx.doi.org/10.1016/j.ijforecast.2008.08.004

http://dx.doi.org/10.1016/S0306-2619(03)00096-5

http://dx.doi.org/10.1016/j.apenergy.2016.12.130

http://dx.doi.org/10.1109/MPER.2002.4312063

http://dx.doi.org/10.1016/j.epsr.2011.06.002

http://dx.doi.org/10.2202/1558-3708.1362


http://dx.doi.org/10.1109/TPWRS.2005.846054


http://dx.doi.org/10.1109/MPAE.2006.1597990

http://dx.doi.org/10.1109/59.780895

http://dx.doi.org/10.1016/j.epsr.2006.09.022



http://dx.doi.org/10.1049/iet-gtd:20060006


http://dx.doi.org/10.1109/tpwrs.2006.873409

[22] D. W. Bunn, A. Gianfreda, Integration and shocktransmissions across European electricity forward mar-kets, Energy Economics 32 (2) (2010) 278–291.doi:10.1016/j.eneco.2009.09.005.

[23] L. M. de Menezes, M. A. Houllier, Reassessing the inte-gration of European electricity markets: A fractionalcointegration analysis, Energy Economics 53 (2016)132–150. doi:10.1016/j.eneco.2014.10.021.

[24] G. Zachmann, Electricity wholesale market prices inEurope: Convergence?, Energy Economics 30 (4) (2008)1659–1671. doi:10.1016/j.eneco.2007.07.002.

[25] E. Lindstrom, F. Regland, Modeling extremedependence between European electricity mar-kets, Energy Economics 34 (4) (2012) 899–904.doi:10.1016/j.eneco.2012.04.006.

[26] F. Ziel, R. Steinert, S. Husmann, Forecast-ing day ahead electricity spot prices: The im-pact of the EXAA to other European electricitymarkets, Energy Economics 51 (2015) 430–444.doi:10.1016/j.eneco.2015.08.005.

[27] I. P. Panapakidis, A. S. Dagoumas, Day-ahead electric-ity price forecasting via the application of artificial neu-ral network based models, Applied Energy 172 (2016)132–151. doi:10.1016/j.apenergy.2016.03.089.

[28] I. Guyon, A. Elisseeff, An introduction to variable andfeature selection, Journal of Machine Learning Research3 (2003) 1157–1182.

[29] J. A. Carta, P. Cabrera, J. M. Matıas, F. Castel-lano, Comparison of feature selection methodsusing ANNs in MCP-wind speed methods. acase study, Applied Energy 158 (2015) 490–507.doi:10.1016/j.apenergy.2015.08.102.

[30] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning,MIT Press, 2016, http://www.deeplearningbook.org/.

[31] M. Stevenson, Filtering and forecasting spot elec-tricity prices in the increasingly deregulated aus-tralian electricity market, in: QFRC ResearchPaper Series, no. 63, Quantitative Finance Re-search Centre, University of Technology, Sydney, 2001,http://www.qfrc.uts.edu.au/research/research_papers/rp63.pdf.

[32] C. P. Rodriguez, G. J. Anders, Energy price forecastingin the Ontario competitive power system market, IEEETransactions on Power Systems 19 (1) (2004) 366–374.doi:10.1109/TPWRS.2003.821470.

[33] Y. Hong, C. Wu, Day-ahead electricity price fore-casting using a hybrid principal component anal-ysis network, Energies 5 (11) (2012) 4711–4725.doi:10.3390/en5114711.

[34] N. Amjady, F. Keynia, Day-ahead price forecasting ofelectricity markets by mutual information techniqueand cascaded neuro-evolutionary algorithm, IEEETransactions on Power Systems 24 (1) (2009) 306–318.doi:10.1109/tpwrs.2008.2006997.

[35] N. Amjady, A. Daraeepour, F. Keynia, Day-aheadelectricity price forecasting by modified relief algo-rithm and hybrid neural network, IET Generation,Transmission & Distribution 4 (3) (2010) 432–444.doi:10.1049/iet-gtd.2009.0297.

[36] D. Keles, J. Scelle, F. Paraschiv, W. Ficht-ner, Extended forecast methods for day-aheadelectricity spot prices applying artificial neuralnetworks, Applied Energy 162 (2016) 218–230.doi:10.1016/j.apenergy.2015.09.087.

[37] A. Ghasemi, H. Shayeghi, M. Moradzadeh, M. Noosh-yar, A novel hybrid algorithm for electricity price

and load forecasting in smart grids with demand-side management, Applied Energy 177 (2016) 40–59.doi:10.1016/j.apenergy.2016.05.083.

[38] O. Abedinia, N. Amjady, H. Zareipour, A new fea-ture selection technique for load and price fore-cast of electrical power systems, IEEE Transac-tions on Power Systems 32 (1) (2017) 62–74.doi:10.1109/TPWRS.2016.2556620.

[39] S. Ruder, An overview of gradient descent optimizationalgorithms, arXiv eprint (2016). arXiv:1609.04747.

[40] M. Shafie-Khah, M. P. Moghaddam, M. Sheikh-El-Eslami, Price forecasting of day-ahead electricity mar-kets using a hybrid forecast method, Energy Con-version and Management 52 (5) (2011) 2165–2169.doi:10.1016/j.enconman.2010.10.047.

[41] D. R. Jones, M. Schonlau, W. J. Welch, Efficient globaloptimization of expensive black-box functions, Jour-nal of Global Optimization 13 (4) (1998) 455–492.doi:10.1023/A:1008306431147.

[42] J. Bergstra, R. Bardenet, Y. Bengio, B. Kegl,Algorithms for hyper-parameter optimiza-tion, in: Advances in Neural InformationProcessing Systems, 2011, pp. 2546–2554,http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-

[43] F. Hutter, H. H. Hoos, K. Leyton-Brown, Sequentialmodel-based optimization for general algorithm config-uration, in: International Conference on Learning andIntelligent Optimization, Springer, 2011, pp. 507–523.doi:10.1007/978-3-642-25566-3_40.

[44] F. Hutter, H. Hoos, K. Leyton-Brown, An effi-cient approach for assessing hyperparameter impor-tance, in: Proceedings of the 31st InternationalConference on International Conference on MachineLearning, Vol. 32 of ICML’14, 2014, pp. 754–762,http://proceedings.mlr.press/v32/hutter14.pdf.

[45] S. Makridakis, Accuracy measures: theo-retical and practical concerns, InternationalJournal of Forecasting 9 (4) (1993) 527–529.doi:10.1016/0169-2070(93)90079-3.

[46] F. X. Diebold, R. S. Mariano, Comparingpredictive accuracy, Journal of Business &Economic Statistics 13 (3) (1995) 253–263.doi:10.1080/07350015.1995.10524599.

[47] Y. LeCun, L. Bottou, G. B. Orr, K.-R. Muller, EfficientBackProp, in: G. B. Orr, K.-R. Muller (Eds.), Neu-ral Networks: Tricks of the Trade, no. 1524 in LectureNotes in Computer Science, Springer Berlin Heidelberg,1998, pp. 9–50. doi:10.1007/3-540-49430-8_2.

[48] RTE, Grid data, https://data.rte-france.com/. Ac-cessed on 15.05.2017.

[49] Elia, Grid data, http://www.elia.be/en/grid-data/dashboard.Accessed on 15.05.2017.

[50] ENTSO-E transparency platform,https://transparency.entsoe.eu/. Accessed on15.05.2017.

[51] V. Nair, G. E. Hinton, Rectified linear units im-prove restricted boltzmann machines, in: Pro-ceedings of the 27th international Conference onMachine Learning (ICML), 2010, pp. 807–814,http://icml2010.haifa.il.ibm.com/papers/432.pdf.

[52] X. Glorot, Y. Bengio, Understanding the difficulty oftraining deep feedforward neural networks, in: Proceed-ings of the International Conference on Artificial Intel-ligence and Statistics (AISTATS’10). Society for Artifi-cial Intelligence and Statistics, 2010, pp. 249–256.

20

http://dx.doi.org/10.1016/j.eneco.2009.09.005







http://www.deeplearningbook.org/

http://www.qfrc.uts.edu.au/research/research_papers/rp63.pdf


http://dx.doi.org/10.3390/en5114711

http://dx.doi.org/10.1109/tpwrs.2008.2006997

http://dx.doi.org/10.1049/iet-gtd.2009.0297




http://arxiv.org/abs/1609.04747

http://dx.doi.org/10.1016/j.enconman.2010.10.047

http://dx.doi.org/10.1023/A:1008306431147

http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization

http://dx.doi.org/10.1007/978-3-642-25566-3_40

http://proceedings.mlr.press/v32/hutter14.pdf

http://dx.doi.org/10.1016/0169-2070(93)90079-3

http://dx.doi.org/10.1080/07350015.1995.10524599

http://dx.doi.org/10.1007/3-540-49430-8_2

https://data.rte-france.com/

http://www.elia.be/en/grid-data/dashboard

https://transparency.entsoe.eu/

http://icml2010.haifa.il.ibm.com/papers/432.pdf

[53] D. P. Kingma, J. Ba, Adam: A method for stochasticoptimization, arXiv eprint (2014). arXiv:1412.6980.

[54] Y. Yao, L. Rosasco, A. Caponnetto, On earlystopping in gradient descent learning, Con-structive Approximation 26 (2) (2007) 289–315.doi:10.1007/s00365-006-0663-2.

[55] J. Yosinski, J. Clune, Y. Bengio, H. Lipson,How transferable are features in deep neural net-works?, in: Z. Ghahramani, M. Welling, C. Cortes,N. D. Lawrence, K. Q. Weinberger (Eds.), Ad-vances in Neural Information Processing Systems27, Curran Associates, Inc., 2014, pp. 3320–3328,https://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.

[56] M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul,J. Z. Leibo, D. Silver, K. Kavukcuoglu, Reinforcementlearning with unsupervised auxiliary tasks, arXiv eprint(2016). arXiv:1611.05397.

[57] X. Li, L. Zhao, L. Wei, M.-H. Yang, F. Wu, Y. Zhuang,H. Ling, J. Wang, DeepSaliency: Multi-Task Deep Neu-ral Network model for salient object detection, IEEETransactions on Image Processing 25 (8) (2016) 3919–3930. doi:10.1109/TIP.2016.2579306.

[58] J. Bergstra, D. Yamins, D. D. Cox, Making a sci-ence of model search: Hyperparameter optimiza-tion in hundreds of dimensions for vision architec-tures, in: Proceedings of the 30th International Con-ference on Machine Learning, 2013, pp. 115–123,http://proceedings.mlr.press/v28/bergstra13.pdf.

[59] H. Zareipour, C. A. Canizares, K. Bhattacharya, Eco-nomic impact of electricity market price forecast-ing errors: A demand-side analysis, IEEE Trans-actions on Power Systems 25 (1) (2010) 254–262.doi:10.1109/TPWRS.2009.2030380.

[60] T. Hong, Crystal ball lessons in predictive analytics,EnergyBiz 12 (2) (2015) 35–37.

[61] B. Uniejewski, J. Nowotarski, R. Weron, Automatedvariable selection and shrinkage for day-ahead elec-tricity price forecasting, Energies 9 (8) (2016) 621.doi:10.3390/en9080621.

[62] J. Wang, H. Zhong, Z. Ma, Q. Xia, C. Kang, Review andprospect of integrated demand response in the multi-energy system, Applied Energy 202 (2017) 772–782.doi:10.1016/j.apenergy.2017.05.150.

[63] S. Nolan, M. O’Malley, Challenges and bar-riers to demand response deployment andevaluation, Applied Energy 152 (2015) 1–10.doi:10.1016/j.apenergy.2015.04.083.

[64] Q. Wang, C. Zhang, Y. Ding, G. Xydis, J. Wang,J. Østergaard, Review of real-time electricity marketsfor integrating distributed energy resources and de-mand response, Applied Energy 138 (2015) 695–706.doi:10.1016/j.apenergy.2014.10.048.

21


http://dx.doi.org/10.1007/s00365-006-0663-2

https://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks


http://dx.doi.org/10.1109/TIP.2016.2579306

http://proceedings.mlr.press/v28/bergstra13.pdf


http://dx.doi.org/10.3390/en9080621




forecasting day-ahead electricity prices in europe: the ...bdeschutter/pub/rep/17_021.pdf ·...

Documents