photovoltaic output prediction using support vector machines

eeh power systemslaboratory

Abhishek Rohatgi

Machine Learning Methods for PowerMarkets

Master ThesisPSL 1217

EEH – Power Systems LaboratorySwiss Federal Institute of Technology (ETH) Zurich

Examiner: Prof. Dr. Goran AnderssonSupervisor: Marcus Hildmann

Zurich, September 17, 2012

ii

Do not believe in anything simply because you have heard it.Do not believe in anything simply because it is spoken and rumored by many.Do not believe in anything simply because it is found written in your religiousbooks.Do not believe in anything merely on the authority of your teachers andelders.Do not believe in traditions because they have been handed down for manygenerations.But after observation and analysis, when you find that anything agrees withreason and is conducive to the good and benefit of one and all, then acceptit and live up to it.

-Gautama Buddha

Preface

This report is a result of my master thesis carried out at Power SystemLaboratories (PSL), ETH Zurich from March, 2012 to August, 2012. Usingthis opportunity, I would like to thank several people who gave me fullsupport during my days at PSL. I would like to thank Marcus Hildmannfor his support, useful ideas and comments on my work. I am thankful toProf. Goran Andersson for giving me the opportunity for the project workat PSL. Last but not least, i am grateful to my colleagues at PSL for thewelcoming atmosphere and fruitful discussions.

Abhishek RohatgiZurich, September 17, 2012

iii

Abstract

To manage risks in electricity markets, forecasting of market variables likespot price, load demand and Hourly price Forward Curve is an importantresearch area. Forecasting using linear estimation methods suffer from theproblem of under-fitting and over-fitting. Ordinary Least Squares (OLS)which is a popular linear estimation method, estimates the mean of thedata. Profile forecasting of time series needs non-linear estimation methods.In this thesis, Support Vector Machines (SVM) and Extreme Learning Ma-chines (ELM) are used for estimation of time series. The thesis consists oftwo parts. In the first part, SVM and ELM algorithms are presented andsimulated for forecasting of spot price time series. In the second half of thethesis, the problem of constrained estimation is explained and two methodsare suggested based on SVM and ELM theory.Support Vector Machine is a machine learning algorithm used for functionestimation. It converts the non-linear function estimation problem to aconvex optimization problem by using functions called kernels. ExtremeLearning Machine is a learning algorithm for Single Layer Feedforward Neu-ral Networks and it estimates the function using Moore-Penrose generalizedinverse matrix of the activation function of the neural network.A case study of spot price forecasting for Germany is presented using bothof the learning algorithms. After identifying the characteristics of the spotprice time series, a Non-Linear Autoregressive model with Exogenous inputs(NARX) is proposed to capture the dynamics of spot prices. To simulate thespot prices, a computationally simple version of SVM called Least SquareSupport Vector Machine (LSSVM) is used. 1-day ahead, 3-day ahead and5-day ahead forecasting is simulated for different lags of the spot price timeseries. LSSVM performs better than ELM for out of sample forecasting.However, parameters of LSSVM need to be tuned before training. The tun-ing process is computationally intensive. Hence, ELM is much faster thanLSSVM. Additionally, out of sample residuals are analyzed for autocorrela-tion to establish the validity of the model.Constrained Estimation means to solve the model subjected to external con-straints. It is required for estimation of time series like HPFC, PV in-feedetc. A proposal is made to include the external constraints in the SVMand ELM theory. The proposed SVM and ELM is applied to a case study

v

vi

of Photovoltaic in-feed forecasting. Results are presented for a few testconstraints. Both SVM and ELM produce good results for constrained esti-mation. Finally, the thesis is concluded with a discussion of the future workon the application of SVM and ELM for time series analysis and constrainedestimation.

Kurzfassung

Um mit den Risiken auf den Strommarkten umzugehen, ist die Prognos-tizierung von Variablen wie zum Beispiel dem Spotpreis, den Lasten undder Stundenterminpreis-Kurve (Hourly Price Forward Curce, HPFC) einwichtiges Forschungsgebiet. Prognoseverfahre, die auf linearen Schatzungenbasieren leiden an den Problemen der Unteranpassung (unter-fitting) undUberanpassung (over-fitting). Die Methode der kleinsten Quadrate (Ordi-nary Least Squares, OLS), die eine beliebte lineare Schatzungsmethode ist,schatzt den Mittelwert der Daten. Die Prognose von Zeitreihen benotigt je-doch nichtlineare Schatzverfahren. In dieser Arbeit werden Support VectorMachines (SVM) und Extreme Learning Machines (ELM) fur die Schatzungvon Zeitreihen verwendet. Die Arbeit besteht aus zwei Teilen. Im erstenTeil werden SVM und ELM Algorithmen vorgestellt und fur die Prognos-tizierung von Spotpreis Zeitreihen eingesetzt. In der zweiten Halfte derArbeit wird das Problem mit der Schatzung mit Nebenbedingungen erklartund es werden zwei Methoden, die auf SVM und ELM Theorie basieren,vorgeschlagen.

Eine Support Vector Machine ist ein Konzept aus dem Bereich des maschi-nellen Lernens, das fur Schatzungen und Regressionen eingesetzt werdenkann. Es verwandelt ein nicht-lineares Schatzproblem in ein konvexes Op-timierungsproblem unter Verwendung von sogenannten Kernels. Eine Ex-treme Learning Machine ist ein Lernalgorithmus fur Single Layer Feedfor-ward Neural Networks und schatzt die Funktion mit der allgemeinen Moore-Penrose Inversmatrix der Aktivierungsfunktion des neuronalen Netzes.

Eine Fallstudie uber die Spotpreisprognose fur Deutschland wird furbeide Lernalgorithmen prasentiert. Nach der Identifizierung der Eigen-schaften der Spotpreis Zeitreihe wird ein nicht-lineares autoregressives Mod-ell mit exogenen Inputs (Non-Linear Autoregressive Model with ExogenousInputs - NARX) vorgeschlagen, um die Dynamik der Spotpreise zu erfassen.Um die Spotpreise zu simulieren, wird eine vereinfachte Version einer SVM,die sogenannte Least Square Support Vector Machine (LSSVM) verwen-det. One day ahead, three day ahead und five day ahead Prognosen wer-den fur verschiedene Verzogerungen der Spotpreis Zeitreihen erstellt. DieLSSVM fuhrt zu besseren Ergebnissen als die ELM fur out of sample Prog-nosen. Allerdings mussen die Parameter der LSSVM vor dem Training

vii

viii

abgestimmt werden. Dieser Tuning-Prozess ist rechenintensiv. Daher istdie ELM viel schneller als die LSSVM. Zusatzlich werden out of sampleresiduals analysiert um per Autokorrelation die Gultigkeit des Modells zuetablieren.

Die Schatzung mit Nebenbedingungen bedeutet, das Modell unter derBerucksichtigung von außeren Bedingungen zu losen. Diese Art der Schatzungwird fur die Prognostizierung von Zeitreihen wie der HPFC, und Foto-voltaikeinspeisung benotigt. Ein Vorschlag, die notigen externen Nebenbe-dingungen in die SVM und ELM Theorie zu integrieren, wird gemacht. Dievorgeschlagenen SVM und ELM werden in einer Fallstudie uber die Prognosevon Fotovoltaikeinspeisung angewendet. Ergebnisse fur einige Testnebenbe-dingungen werden vorgestellt. Die SVM und ELM fuhren beide zu gutenErgebnissen fur die Schatzung mit Nebenbedingungen. Die Arbeit schliesstmit einer Diskussion uber zukunftige Forschungsmoglichkeiten auf dem Ge-biet der Anwendung von SVM und ELM fur die Zeitreihenanalyse und dieSchatzung mit Nebenbedingungen ab.

Contents

List of Acronyms xiii

List of Symbols xv

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.1 Machine Learning Algorithms . . . . . . . . . . . . . . 2

1.3.2 Estimation and Prediction for Electricity Markets . . 3

I Forecasting in Electricity Markets 5

2 Electricity Market Analysis 7

2.1 Electricity Market Deregulation . . . . . . . . . . . . . . . . . 7

2.2 Risks in Electricity Markets . . . . . . . . . . . . . . . . . . . 8

2.3 Non-Linear Problems in Electricity Markets . . . . . . . . . . 9

3 SVM and ELM 11

3.1 Support Vector Machines . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Linear SVM . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.2 Nonlinear Extension of SVM . . . . . . . . . . . . . . 13

3.1.3 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1.4 Least Square Support Vector Machines . . . . . . . . . 16

3.1.5 Tuning parameters . . . . . . . . . . . . . . . . . . . . 17

3.2 Extreme Learning Machines . . . . . . . . . . . . . . . . . . . 18

3.2.1 ELM theory . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Case Study - Spot Price Forecasting 21

4.1 Characteristics of Electricity Prices . . . . . . . . . . . . . . . 21

4.2 Methods for Price forecasting . . . . . . . . . . . . . . . . . . 22

4.2.1 Importance and need of price forecasting . . . . . . . 22

4.2.2 Modeling of Electricity Prices . . . . . . . . . . . . . . 23

ix

x CONTENTS

5 Model Representation and Estimation 27

5.1 Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . 27

5.2 Short Term Spot Price Model . . . . . . . . . . . . . . . . . . 27

5.2.1 Forecast Accuracy Measures . . . . . . . . . . . . . . . 31

6 Empirical Analysis- Price Forecasting 33

6.1 Spot Price Model . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.2 LSSVM and ELM - Simulation results for Spot Prices . . . . 35

6.2.1 Parameter Selection for LSSVM . . . . . . . . . . . . 35

6.2.2 Training Results . . . . . . . . . . . . . . . . . . . . . 35

6.2.3 Forecasting Performance . . . . . . . . . . . . . . . . . 38

6.2.4 Residual Analysis . . . . . . . . . . . . . . . . . . . . . 39

6.3 Forecast Accuracy Analysis . . . . . . . . . . . . . . . . . . . 40

6.4 Transition Case . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.5 LSSVM and ELM - Execution Time . . . . . . . . . . . . . . 44

II Constrained Estimation 49

7 Constrained Estimation 51

7.1 SVM and ELM for constrained estimation . . . . . . . . . . . 52

7.2 SVM with external constraints . . . . . . . . . . . . . . . . . 52

7.2.1 Solving the dual problem . . . . . . . . . . . . . . . . 53

7.2.2 Solving with the random feature space . . . . . . . . . 54

7.3 ELM with external constraints . . . . . . . . . . . . . . . . . 55

7.3.1 Optimization based ELM . . . . . . . . . . . . . . . . 56

7.4 Results for a artificial known process . . . . . . . . . . . . . . 57

7.4.1 Random feature space based SVM . . . . . . . . . . . 57

7.4.2 ELM variant . . . . . . . . . . . . . . . . . . . . . . . 58

8 Case Study - PV Infeed Forecasting 63

8.1 Photovoltaic Infeed forecast model . . . . . . . . . . . . . . . 63

8.2 Characteristics of PV infeed . . . . . . . . . . . . . . . . . . . 63

8.3 Model for PV infeed . . . . . . . . . . . . . . . . . . . . . . . 66

8.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 66

8.5 SVM results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

8.5.1 SVM with random feature space without constraints . 67

8.5.2 SVM with random feature space with constraints . . . 67

8.6 ELM results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

8.6.1 ELM-variant with constraints . . . . . . . . . . . . . . 72

9 Conclusion 75

A LSSVM and ELM complete results 77

CONTENTS xi

B Complete Results for Constrained Estimation 83

Bibliography 89

xii CONTENTS

List of Acronyms

SVM Support Vector Machine

ELM Extreme Learning Machine

ANN Artificial Neural Networks

TSO Transmission System Operator

ACF Autocorrelation Factor

NARX Non-Linear Autoregressive with External Inputs

AR Autoregressive

MAE Mean Absolute Error

MAPE Mean Absolute Percentage Error

OLS Ordinary Least Squares

EPEX European Power Exchange

SLFN Single Layer Feedforward Network

LSSVM Least Square Support Vector Machine

xiii

xiv CONTENTS

List of Symbols

xk, ykNk=1 Training data, xk is input, yk is output, N is the number of data pointsφ High dimensional feature space for SVMH Activation function matrix for ELMσ Gaussian Kernel parameterγ Regularization parameter of LSSVMK(xi, xj) Kernel function defined for xi and xjβ Output weight of the SLFNWeat Weather variablesTmaxt maximum temperatureTmint minimum temperatureTmeant mean temperatureW st wind speed

PPt precipitationHt Dummy variable for hours of a dayDt Dummy variable for days of a weekMt Dummy variable for Months of a yearLt Vertical Load

xv

xvi CONTENTS

List of Figures

2.1 Deregulation of Electricity Industry . . . . . . . . . . . . . . 7

2.2 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 Primal and dual optimization problem of SVM . . . . . . . . 15

3.2 Single Layer Feedforward Neural Network . . . . . . . . . . . 18

4.1 Combining Fundamental and Quantitative Approach for priceforecasting [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.1 Steps in time series forecasting . . . . . . . . . . . . . . . . . 28

5.2 Autocorrelation of the EPEX prices for March 2012 . . . . . 29

5.3 Correlation of electricity prices and load . . . . . . . . . . . . 29

5.4 Correlation of electricity prices and temperature . . . . . . . 30

5.5 Seasonality of electricity prices . . . . . . . . . . . . . . . . . 31

5.6 NARX model for electricity spot prices . . . . . . . . . . . . . 32

6.1 Spot price time series for Germany(Feb, 2012 to May, 2012) . 33

6.2 Autocorrelation of the Spot price time series . . . . . . . . . . 34

6.3 Cross validation scores(1 day lag) . . . . . . . . . . . . . . . . 35



6.6 LSSVM training results . . . . . . . . . . . . . . . . . . . . . 37

6.7 ELM training results . . . . . . . . . . . . . . . . . . . . . . . 37

6.8 LSSVM In Sample fit - 5 Day forecast . . . . . . . . . . . . . 38

6.9 LSSVM Out of Sample fit - 5 Day forecast . . . . . . . . . . . 39

6.10 Autocorrelation of Out of Sample residuals for LSSVM . . . . 40

6.11 Autocorrelation of Out of Sample residuals for ELM . . . . . 41

6.12 Histogram of Out of Sample residuals for LSSVM . . . . . . . 41

6.13 Histogram of Out of Sample residuals for ELM . . . . . . . . 42

6.14 In-Sample spot prices for transition case . . . . . . . . . . . . 44

6.15 Out of Sample spot prices for transition case . . . . . . . . . 45

6.16 LSSVM and ELM performance for transition case (in-sample) 45

6.17 LSSVM and ELM performance for transition case (out of sam-ple) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

xvii

xviii LIST OF FIGURES

6.18 MATLAB profiler results for LSSVM . . . . . . . . . . . . . . 476.19 MATLAB profiler results for ELM . . . . . . . . . . . . . . . 48

7.1 Constrained Estimation . . . . . . . . . . . . . . . . . . . . . 517.2 Random Feature Space based SVM for constrained estimation 547.3 ELM-variant with external constraints . . . . . . . . . . . . . 577.4 SVM results (Out of Sample) . . . . . . . . . . . . . . . . . . 587.5 SVM results for constraint 1(Out of Sample) . . . . . . . . . 597.6 SVM results for constraint 2(Out of Sample) . . . . . . . . . 597.7 ELM results for constraint 1(Out of Sample) . . . . . . . . . 607.8 ELM results for constraint 1(Out of Sample) . . . . . . . . . 607.9 ELM results for constraint 2(Out of Sample) . . . . . . . . . 61

8.1 Cross correlation of PV infeed time series for 2011 and 2012 . 648.2 Multi Scale Seasonality of PV infeed . . . . . . . . . . . . . . 658.3 Correlation of PV infeed and Mean Temperature . . . . . . . 658.4 PV infeed time series from March 2012 to Jun 2012 . . . . . . 678.5 Training results (SVM with random feature space,without

constraints) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688.6 In Sample results (SVM with random feature space,without

constraints) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688.7 Out of Sample results (SVM with random feature space,without

constraints) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698.8 In Sample results (SVM with random feature space,with con-

straints 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708.9 Out of Sample results (SVM with random feature space,with

constraints 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . 708.10 Out of Sample results (SVM with random feature space,with

constraints 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 718.11 Out of Sample results (ELM with constraints 1) . . . . . . . . 728.12 Out of Sample results (ELM with constraints 2) . . . . . . . . 73

A.1 LSSVM In Sample fit - 1 Day forecast . . . . . . . . . . . . . 77A.2 LSSVM In Sample fit - 3 Day forecast . . . . . . . . . . . . . 78A.3 LSSVM Out of Sample fit - 1 Day forecast . . . . . . . . . . . 78A.4 LSSVM Out of Sample fit - 3 Day forecast . . . . . . . . . . . 79A.5 ELM In Sample fit - 1 Day forecast . . . . . . . . . . . . . . . 79A.6 ELM In Sample fit - 3 Day forecast . . . . . . . . . . . . . . . 80A.7 ELM Out of Sample fit - 1 Day forecast . . . . . . . . . . . . 80A.8 ELM Out of Sample fit - 3 Day forecast . . . . . . . . . . . . 81

B.1 ELM-variant training results . . . . . . . . . . . . . . . . . . 83B.2 In-sample results for PV in-feed(ELM, no constraint) . . . . . 84B.3 Out of sample results for PV in-feed(ELM, no constraint) . . 84B.4 Autocorrelation of in-sample residuals(SVM, no constraint) . 85

LIST OF FIGURES xix

B.5 Autocorrelation of out of sample residuals(SVM, no constraint) 85B.6 Autocorrelation of in-sample residuals(SVM, constraint 1) . . 86B.7 Autocorrelation of in-sample residuals(SVM, constraint 2) . . 86B.8 Autocorrelation of out of sample residuals(SVM, constraint 1) 87B.9 Autocorrelation of out of sample residuals(SVM, constraint 2) 87

xx LIST OF FIGURES

Chapter 1

Introduction

1.1 Motivation

The deregulation process going on in the electricity market has changeda vertically integrated electricity sector to a horizontally integrated sector.The generation, transmission and distribution sectors are separated and nolonger controlled by single utility. It has increased the risks for the utilitiesin the electricity industry. There is growing need for managing the new riskscoming into the market due to deregulation. Also, the promotion policiesfor renewable energy sources (like feed-in tariff) are forcing the fossil fuelbased generation companies to change the business model. To manage therisks, estimation of quantities like spot price, load profile and forward curvesis required.

Problems with linear estimation methods

The linear estimation methods suffer from the problem of under-estimationand over-estimation. For example, Ordinary Least Squares(OLS) is theMaximum Likelihood Estimator(MLE) of the mean of normal distribution.Hence, using OLS regression for estimating the profile of a time series willlead to estimation based on mean of the data. Non-Linear methods can beused to overcome this problem. Neural Networks are widely used for esti-mation of non-linear systems. However, traditional learning methods sufferfrom non-convexity and over-fitting. Support Vector Machine(SVM)[2] andExtreme Learning Machine(ELM)[3] are two state of the art algorithms fortraining the neural networks. Both of them overcome the problem of non-convexity and hence give a globally optimum solution to the non-linear es-timation problem. This thesis compares the performance of SVM and ELMfor profile forecasting of electricity spot prices.

1

2 CHAPTER 1. INTRODUCTION

Constrained Estimation

In time series analysis, the modeling approach follows three steps - identi-fication, estimation and validation. Estimation is done using learning al-gorithms by applying sample data to the model proposed after identifyingthe characteristics of the time series. After validation, the model is usedfor predicting the time series. Constrained Estimation refers to forecastingsubjected to some constraints. The constraints are defined at the time oftraining the model on a given set of input data. In this way, the estima-tion and prediction is combined [4] and the prediction can be more suitablefor the future developments. This approach is useful for applying any in-formation available for the future to the models that are trained on pastdata.

1.2 Outline

The thesis is organized as follows:

Part I - Forecasting in Electricity Markets

Chapter 2 describes the risks in Electricity markets and a brief introductionto the learning algorithms for the neural networks is given. Chapter 3 de-scribes the theory behind Support Vector Machine and Extreme LearningMachine. Characteristics of electricity spot prices and the different methodsused for spot price forecasting are given in Chapter 4. A time series modelfor spot price forecasting is presented in Chapter 5. Chapter 6 gives detaileddiscussion of the results of the two learning algorithms.

Part II - Constrained Estimation

Chapter 7 describes the methodology used for solving the constrained esti-mation problems using SVM and ELM theory. Chapter 8 provides a casestudy of Photovoltaic infeed forecasting and the SVM and ELM algorithmsare applied for in-feed estimation subjected to constraints.

Chapter 9 concludes the report with the possible future research work.

1.3 Literature Survey

1.3.1 Machine Learning Algorithms

J. A. K. Suykens et. al. [2] presents the concept of Least SquareSupport Vector Machines (LSSVM) that are computationally easier to im-plement than Support Vector Machines(SVM). The implementation detailslike selection of tuning parameters of LSSVM are also presented. The book

1.3. LITERATURE SURVEY 3

also addresses the bayesian inference for LSSVM, robustness and applicationof LSSVM to large data set problems using the Nystrom method.

G. B. Huang et. al. [3] presents a new learning algorithm for singlelayer feedforward neural networks called extreme learning machine (ELM).It is based on generating the parameters of feedforward neural networksrandomly and hence eliminates the need for tuning. Examples of real worldproblems are also presented for comparison of ELM, Back Propagation(BP)and SVM.

Benoit Frenay and Michel Verleysen [5] proposes to bring the ELMframework in the SVM by defining a new kernel called as ELM kernel. Thiseliminates the need to tune the parameters since they are generated ran-domly similar to ELM. The motivation to bring the randomness in SVM isderived from the non-suitability of ELM for classification problems as ELMis not based on the maximum-margin hyperplane principle.

Qiuge Liu, Qing He and Zhongzhi Shi [6] proposes a learning algorithmin which the feature space is explicitly defined using a SLFN that has randominput weights. It has better generalization performance than ELM and isfaster than standard SVM. The proposed algorithm is called as non-linearExtreme Support Vector Machine (ESVM) Classifier.

J. A. K. Suykens and J. Vandewalle [7] give a method to apply theSVM theory to train multilayer perceptron. In case of ELM, the hidden layeris non parametric. In the proposed method, the hidden layer is parametricand the parameters can be tuned by Quadratic Programming problem.

Guan-Bin Huang, Xiaojian Ding and Hongming Zhou[8] give theoptimization based Extreme Learning Machine which has less optimizationconstraints than the SVM. The proposed ELM has better generalizationperformance than SVM and is less sensitive to the number of hidden nodes.

Guang-Bin Huang, Hingming Zhou, Xiaojian Ding and Rui Zhang[9]give a unified ELM for regression and classification. The ELM is presentedas an optimization problem with equality constraints and it is shown thatsingle optimization framework can be used for both regression and classifi-cation. It also introduces random feature mapping and kernels to solve theoptimization based ELM.

1.3.2 Estimation and Prediction for Electricity Markets

Marcelo Espinoza, J. A. K. Suykens, Ronnie Belmans and Bart Demoor[10] provides a load model for Short Term Load Forecasting(STLF)based on non-linear system identification theory. NARX and AR-NARXmodels are proposed for load forecasting and are estimation is done usingSupport Vector Machines. A brief overview of SVM is also given and themodel is evaluated on performance measures like Mean Absolute PercantageError(MAPE) and Mean Square Error(MSE).

4 CHAPTER 1. INTRODUCTION

Weron and Misiorek[11] gives a comparison of various time-series model-ing approaches for forecasting of spot electricity prices. The comparison isdone between parametric and semi-parametric models. A total of 12 differ-ent models are considered for day ahead forecasting in California and Nordicelectricity markets. The semi-parametric models are found to be better thanparametric models for point and interval forecast.Adam Misiorek, Stefan Trueck, Rafal Weron[12] compares the lin-ear and non-linear time-series models for forecasting of electricity prices forCalifornia electricity market. It also gives a brief overview of different typesof approaches for price forecasting like long term, medium term and shortterm.Derek W. Bunn and Nektaria karakatsani[13] reviews the various ap-proaches to model the electricity prices. The review covers the stochasticmodeling approach and structural modeling approach. The factors that in-fluence the electricity prices are presented and modeling of volatility of pricesis also discussed.Marcelo Espinoza et al.[14] gives the implementation of LSSVM for threetime series with different characteristics. The time series differ in termsof seasonality and autocorrelation. Choice of models based on time seriescharacteristics is also explained. All the models are then evaluated in termsof Mean Square Error(MSE).Marcelo Espinoza, J. A. K. Suykens, Bart De Moor[15] address themodification of the NARX model required if the residuals of the NARXmodel shows significant degree of autocorrelation. The resulting modelNAR(X)-AR gives the model that can account for better system dynam-ics. Also, the modified LSSVM equations for these models are derived inthis paper.

Part I

Forecasting in ElectricityMarkets

5

Chapter 2

Electricity Market Analysis

2.1 Electricity Market Deregulation

Figure 2.1: Deregulation of Electricity Industry

The introduction of competition in electricity sector has changed thevertically integrated sector in a horizontal sector. Previously, the gener-ation, transmission and distribution sectors were controlled by single com-pany which made risk management simple as compared to the scenario whenthe generation, transmission and distribution is owned by different entities.Fig. 2.1 shows the change of the vertical industry in a horizontal indus-try. The generation and distribution sectors have seen the entry of privateplayers whose aim is to maximize the profit. Transmission sector remains amonopoly due to security reasons. Traders and regulators are the two newentities. Since, one company no longer holds the entire supply chain, riskmanagement has become a important aspect of the electricity business. Forexample, distribution companies must be able to meet the demands of theconsumers from the electricity bought at the power exchange. If there is

7

8 CHAPTER 2. ELECTRICITY MARKET ANALYSIS

some problem in the procurement of electricity in the delivery time, propermechanism must be in place to get electricity from alternative sources. Theincrease of renewable energy in feed has also changed the nature of theelectricity business. All over the world, governments are implementing poli-cies to favor the renewable sources. This is a source of additional risk fortraditional thermal plants as it has changed the merit order curve.

2.2 Risks in Electricity Markets

Every financial market poses risks to its participants. Electricity markets areno exception. The participants of Electricity market are generation compa-nies, trading companies, distribution companies, consumers and regulators.For example, there is a risk for a generation company that it is not able tosupply electricity at some point of time and hence will not receive the ex-pected cash flows. Similarly, trading companies might face risk of not beingable to sell the financial products. The risks in the Electricity market canbe divided in two broad categories [16], [13]

1. Traditional financial risks

• Price Risk - It is the risk present due to price movements. Ifthe prices are highly volatile, there is a high risk of losses dueto rise/fall in the prices. For example, a generation companywill lose a significant amount of money if the price falls in theelectricity markets.

• Credit Risk - This is the risk of default by a counter party likenon payment of money by the counter party due to bankruptcy.

• Liquidity Risk - Liquidity risk is the risk present when the marketparticipants are not able to close their positions like for exampledue to lack of financial products.

• Operational Risk - This is the risk present in daily operations ofthe markets like failure of information systems, human error etc.

2. Electricity Specific risks

• Volume Risk - Having a delivery contract of electricity does notguarantee the supply of electricity. Supply is affected by the fuelavailability. In case of renewables, the supply is dependent onthe solar radiation, wind speed and hence poses a volume risk ofelectricity.

• Basis Risk - This refers to the risk present due to change in therelative prices of two products. For example, a trading companythat is planning to speculate on the differences of electricity pricesin two regions is exposed to this type of risk.

2.3. NON-LINEAR PROBLEMS IN ELECTRICITY MARKETS 9

• Physical Risk - Electricity need to be transmitted using the gridnetwork. Any technical fault in transmission and distribution sys-tem can lead to loss for the market players. Likewise, limitationsin transmission of electricity due to congestion is also a source ofrisk for electricity markets.

• Regulation Risk - Electricity is a basic need and hence needs tobe regulated. Also, the need to increase the renewable in-feed hasforced the regulators to change the policies in favor of producersof renewable energy. This is a risk for the generation companiesproducing energy from the fossil fuels. They need to adjust tothe constantly evolving policies for the renewable energy.

2.3 Non-Linear Problems in Electricity Markets

To manage the risks in electricity market, forecasting of different marketindicators is required. For example, load profile needs to be forecasted. It isimportant for generation companies to have an idea of the possible load pro-files so that they can optimize the generation costs. Forecasting load profileis also important for Transmission System Operator (TSO) and for compa-nies providing the regulating power. Forecasting is required for spot prices,PV infeed, Wind infeed, Forward curves etc. One of the common require-ment for the forecasting of these market indicators is non-linear estimationmethods.

Figure 2.2: Neural Networks

Artificial Neural Networks are widely used for estimation of non-linearsystems. They are mathematical tools to estimate a non-linear function. Aneural network must be trained before it is used for estimation. In otherwords, they must learn how to perform on a set of input to produce a

10 CHAPTER 2. ELECTRICITY MARKET ANALYSIS

desired output. There are different types of learning approaches: supervisedlearning, unsupervised learning and reinforcement learning. In supervisedlearning, the neural network is trained by a set of inputs and outputs. Duringtraining, the inputs are given along with outputs so that neural networkknows what output to produce for a given set of input. After training,neural netwrok is ready to estimate the output given the new inputs. Inunsupervised learning, only the inputs are given. The aim is to find somepatterns in the input. Reinforcement learning refers to the learning in theneural network has to identify which input should produce which output.This is obtained by maximizing a reward signal [17]. In this thesis, learningalways means supervised learning.

Supervised learning methods

To train the neural networks, a lot of algorithms exists. Many of the learningalgorithms depend on the concept of gradient descent. Gradient descent doesnot guarantee a global minima of the cost function unless some assumptionsare made of the cost function [17]. Support Vector Machine 1 and ExtremeLearning Machine are two algorithms that formulate the learning problemas a optimization problem and hence global minimum is obtained by thesealgorithms (fig. 2.2).

1Support Vector Machines were developed separately from the Neural Networks. How-ever, now it is generally accepted in the research community that the Support VectorMachines can be seen in the framework on Artificial Neural Networks[2]

Chapter 3

SVM and ELM

3.1 Support Vector Machines

Support Vector Machines(SVM) is the machine learning algorithm used forclassification and regression. It was developed originally for classificationproblems but later it was extended to regression problems[2]. SVM theoryis based on the concept of maximum margin hyperplanes which means sep-arating a set of points using a plane that is at maximum possible distancefrom each of the set of points[18]. The maximum distance is calculated bysolving an optimization problem. In this thesis, SVM is explained for regres-sion only. SVM can be applied to both linear and non-linear problems. Forthe sake of simplicity, linear SVM is described first followed by non-linearSVM.

3.1.1 Linear SVM

Consider the regression problem

f(x) = wTx+ b (3.1)

where the input is xkNk=1 with output ykNk=1. The cost function forempirical risk minimization of the regression problem in eq.(3.1) is [2]

Remp =1

N

N∑k=1

∣∣yk − wTxk + b∣∣ε

(3.2)

The cost function Remp is based on Vapnik’s ε-insensitive loss functionwhich is defined as [2]:

|y − f(x)|ε =

0, if |y − f(x)| ≤ ε|y − f(x)| − ε, otherwise

(3.3)

11

12 CHAPTER 3. SVM AND ELM

The variable ε controls the accuracy and is predefined. The regressionproblem in eq.(3.1) is estimated using the following optimization problem[2]

minw,bJp(w) =1

2wTw

such that yk − wTxk − b ≤ ε, k = 1, . . . , N

wTxk + b− yk ≤ ε, k = 1, . . . , N (3.4)

The inequalities in eq.(3.4) means that the training data lies inside ε-tube of accuracy. However, it is possible that the training data might lieoutside the ε accuracy region, and hence eq.(3.4) is modified to include twomore slack variables ξ, ξ? as follows [2]:

minw,b,ξ,ξ?Jp(w, ξ, ξ?) =

1

2wTw + c

N∑k=1

(ξk + ξ?k)

such that yk − wTxk − b ≤ ε+ ξk, k = 1, . . . , N

wTxk + b− yk ≤ ε+ ξ?k, k = 1, . . . , N

ξk, ξ?k ≥ 0, k = 1, . . . , N (3.5)

where c is a regularization constant. The Lagrangian of eq.(3.5) is

L(w, b, ξ, ξ?;α, α?, η, η?) =1

2wTw + c

N∑k=1

(ξk + ξ?k)

−N∑k=1

αk(ε+ ξk − yk + wTxk + b)−N∑k=1

α?k(ε+ ξ?k + yk − wTxk − b)

−N∑k=1

(ηkψk + η?kψ?k) (3.6)

The Karush-Kuhn-Tucker (KKT) conditions of optimality[19] give the

3.1. SUPPORT VECTOR MACHINES 13

following equations [2]:

yk − wTxk − b ≤ ε+ ξk, k = 1, . . . , N

wTxk + b− yk ≤ ε+ ξ?k, k = 1, . . . , N

αk, α?k, ηk, η

?k ≥ 0, k = 1, . . . , N

αk(ε+ ξk − yk + wTxk + b) = 0, k = 1, . . . , N

α?k(ε+ ξ?k + yk − wTxk − b) = 0, k = 1, . . . , N

ηkψk = 0, k = 1, . . . , N

η?kψ?k = 0, k = 1, . . . , N

∂L∂w

= 0→ w =N∑k=1

(αk − α?k)xk

∂L∂b

= 0→N∑k=1

(−αk + α?k) = 0

∂L∂ξ

= 0→ c− αk − ηk = 0

∂L∂ξ?

= 0→ c− α?k − η?k = 0 (3.7)

Eq.(3.7) and eq.(3.6) together gives the following dual problem:

maxα,α?Jd =−1

2

N∑k,l=1

(αk − α?k)(αl − α?l )xTk xl − εN∑k=1

(αk + α?k)

+N∑k=1

yk(αk − α?k)

such thatN∑k=1

(−αk + α?k) = 0

αk, α?k ∈ [0, c] (3.8)

Using the value of w from eq.(3.7) in terms of α and α?, the estimatedfunction from eq.(3.1) can be written as following

f(x) =

N∑k=1

(αk − α?k)xTk x+ b

3.1.2 Nonlinear Extension of SVM

For extending the Linear SVM for non-linear systems, the regression problemis written as following in the primal equation space:

f(x) = wTφ(x) + b (3.9)


The training data is xk, ykNk=1 and φ(.) : Rn → Rnh is a mapping from theinput space to a high dimensional feature space. The optimization problemin the primal space is [2]


1

2wTw + c

N∑k=1

(ξk + ξ?k)

such that yk − wTφ(xk)− b ≤ ε+ ξk, k = 1, . . . , N

wTφ(xk) + b− yk ≤ ε+ ξ?k, k = 1, . . . , N

ξk, ξ?k ≥ 0, k = 1, . . . , N (3.10)

After taking the Lagrangian and applying the conditions of optimality[19],the problem can be written in dual space as [2]

maxα,α?JD(α, α?) =−1

2

N∑k,l=1

(αk − α?k)(αl − α?l )K(xk, xl)

− εN∑k=1

(αk + α?k) +N∑k=1

yk(αk − α?k)

such that

N∑k=1

(αk − α?k) = 0

αk, α?k ∈ [0, c] (3.11)

Here K is the kernel and is defined as K(xk, xl) = φ(xk)Tφ(xl). During

the transformation of the optimization problem from primal to dual(eq.(3.10)to eq.(3.11)), the non-linear effects are moved to the kernel and eq.(3.11) be-comes a convex optimization problem. This is also explained in fig.3.1. Theestimated function can then be written as

f(x) =

N∑k=1

(αk − α?k)K(x, xk) + b (3.12)

In eq. (3.12), the output is written only in terms of the lagrange multi-pliers and the kernel function. Hence, for estimation problems, one does notneed to know the underlying feature space φ(x). This is explained in moredetail in the following section of kernels.

3.1.3 Kernels

Kernels are a class of functions extensively used in statistics and probabilitytheory. A kernel function K maps Rn × Rn → R[2]. The advantage of thekernel functions is that they can be used to avoid the explicit construction


Figure 3.1: Primal and dual optimization problem of SVM

of the feature space φ(x) required for the non-linear SVMs. Fig. 3.1 showsthat the kernel functions remove the non-linear constraints from the primaloptimization problem. Any symmetric continuous function K(x, z) thatsatisfy the Mercers condition [2] can be expressed as

K(x, z) =

nH∑i=1

λiϕi(x)ϕi(z) (3.13)

where ϕ(x) is a mapping from Rn to Hilbert space H, λi is a positive number,x, z ∈ Rn and nH is the dimension of the hilbert space. Eq.(3.13) can bewritten as:

K(x, z) =

nH∑i=1

√λiϕi(x)

√λiϕi(z)

and then, if we define φi(x) =√λiϕi(x) and φi(z) =

√λiϕi(z) which leads

toK(x, z) = φ(x)Tφ(z)

For example, if φ(x) : R → R3 is defined as φ(x) =[x2,√

2x, 1]

[10],then

φ(x)Tφ(z) =[x2,√

2x, 1]T [

z2,√

2z, 1]

=[x2z2 + 2xz + 1

]= (xz + 1)2 (3.14)


This can be represented by polynomial kernels given by

K(x, z) = (xz + c)d (3.15)

with c = 1 and d = 2. In general, the polynomial kernels can be usedto represent any feature map consisting of all possible product monomials

of x up to degree d having dimension nH =

(n+ dn

). So, by defining a

polynomial kernel, there is no need to explicitly define the high dimensionalfeature space φ(x). Different type of kernel functions exists for applicationto non linear systems. A Gaussian kernel is defined as

K(x, z) = exp(−‖x− z‖2

σ2) (3.16)

where σ is a tuning parameter. For a Gaussian kernel, φ(x) is infinite di-mensional [10]. In this thesis, Gaussian kernel is used.

3.1.4 Least Square Support Vector Machines

LSSVM is a method to reduce the computational effort required to solvethe solve the QP problems for the SVM. The optimization problem of SVMcontains inequality constraints. By removing all the inequality constraintsand substituting them by euqality constraints as shown below, it is possibleto reduce the computational effort since the dual problem is reduced to asystem of linear equations.

minw,b,ψJp(w,ψ) =1

2wTw + γ

N∑k=1

(ψ2)

such that yk − wTφ(xk)− b = ψk, k = 1, . . . , N (3.17)

The Lagrangian of primal problem of LSSVM in eq.(3.17) is

L(w, b, ψ, α) =1

2wTw + γ

N∑k=1

(ψ2)−N∑k=1

αk(yk − wTφ(xk)− b− ψk)

(3.18)

where αk are the lagrange multipliers.

After applying the conditions of optimality[19], the dual problem ofeq.(3.17) is obtained as a system of linear equations in α and b[2]:[

0 1Tv1v Ω + I/γ

] [bα

]=

[0y

](3.19)


where y = [y1; . . . ; yN ] , 1v = [1; . . . ; 1] , α = [α1; . . . ;αN ] and Ωkl =φ(xk)

Tφ(xl) = K(xk, xl),K is the kernel. The estimated function is

y(x) =N∑k=1

αkK(x, xk) + b (3.20)

In LSSVM, there are no QP problems to solve. By using the linear solvers(that are faster than convex optimization solvers), the computational speedis increased by several times.

3.1.5 Tuning parameters

The performance of LSSVM depends on the choice of the regularization pa-rameter γ in eq.(3.17)) and any other parameters used (c and d for polyno-mial kernel (eq.(3.15)) or σ for Gaussian kernel (eq.(3.16))). Since Gaussiankernel is used in this thesis, tuning of parameters refers to tuning of (γ, σ)unless otherwise stated. The most popular techniques for tuning of parame-ters are cross-validation and Bayesian inference. Cross-validation is based onselecting parameters after evaluating the performance of a pre-defined grid ofparameters on the training data. In Bayesian inference, the parameters areassumed to have a certain probability density function. For the determina-tion of tuning parameters in this project, cross-validation technique is used.The algorithm of m-fold cross-validation is outlined in Algorithm-1 [10]:

input : Training data T = (xk, yk)Nk=1

output: Tuned parameters (γ, σ)begin

Divide T in m parts T1, . . . , Tm such that T = ∪mk=1Tk;Define a N1 ×N2 grid of γ and σ;for all combinations of γ and σ do

for k =1:m doDefine a set Sk = ∪mi=1,i 6=kTi;Train SVM on Sk;Calculate the performance of the SVM on the set Tk. Thiscan be done by defining a loss function ρ ( for example-Mean Square Error);

end

endSelect the γ and σ with the lowest value of loss function ρ;

endAlgorithm 1: m-fold cross validation for parameter selection [10]

The most common value of m is 5 and 10. For m=1, it is called leave oneout cross validation. Leave one out cross validation is the less biased than5 fold or 10 fold cross validation. However, the choice of m also dependson the size of data. Leave-one-out crossvalidation is computationally more


intensive than other m-fold cross validation. Due to this reason, 10 foldcross validation is used for tuning the parameters and the loss function usedis Mean Absolute Percentage Error(MAPE).

3.2 Extreme Learning Machines

The learning speed of SLFN is very slow since all the parameters needs to betuned iteratively. In [3], authors have proposed a new learning algorithm forthe single-hidden layer feedforward neural networks (SLFNs) called ExtremeLearning Machine(ELM). ELM is based on random hidden nodes whichmeans that the activation function parameters are chosen randomly. Thenit analytically determines the output weight of SLFNs.

3.2.1 ELM theory

Figure 3.2: Single Layer Feedforward Neural Network

Consider the SLFN withNh hidden nodes. The training data is xk, ykNk=1

where x ∈ Rn and y ∈ Rm. The SLFN can be written as

Nh∑i=1

βihi(xk) = yk, k = 1, . . . , N (3.21)

hi(xk) = h(wixk + bi) (3.22)

where wi ∈ Rn is the weight vectors between the n input nodes and ithhidden node, βi ∈ Rm is the weight vector between the ith hidden nodeand the output nodes, bi is the threshold of the ith hidden node and h is

3.2. EXTREME LEARNING MACHINES 19

the activation function. Fig. 3.2 shows the architecture of SLFN. In matrixform, eq.(3.22) is

Hβ = Y (3.23)

where

HN×Nh=

h(w1x1 + b1) . . . h(wNhx1 + bNh

)...

. . ....

h(w1xN + b1) . . . h(wNhxN + bNh

)

(3.24)

βNh×m =

β1...βNh

and YN×m =

y1...N

(3.25)

H is called the hidden layer matrix. It gives the transformation functionfrom the input space to the hidden neurons space. The ELM is based onthe following two theorems [3]:

Theorem 1 Given a standard SLFN with N hidden nodes and activationfunction h : R → R which is infinitely differentiable in any interval, for Narbitrary distinct samples xk, ykNk=1 where x ∈ Rn and y ∈ Rm, for any wiand bi randomly chosen from any intervals of Rn and R, respectively, accord-ing to any continuous probability distribution, then with probability one, thehidden layer output matrix H of the SLFN is invertible and

∥∥Hβ −Y∥∥ = 0

Theorem 2 Given any small positive value ε > 0 and activation functionh : R → R which is infinitely differentiable in any interval, there existsNh ≤ N such that for N arbitrary distinct samples xk, ykNk=1 where x ∈ Rnand y ∈ Rm, for any wi and bi randomly chosen from any intervals of Rnand R, respectively, according to any continuous probability distribution, thenwith probability one,

∥∥Hβ −Y∥∥ < ε.

For proof of both theorems see [3]. For training the SLFN, eq.(3.22)leads to finding wi, bi and β such that∥∥H(w, bi)β −Y

∥∥ = minwi,bi,β

∥∥H(w, b)β −Y∥∥ (3.26)

where w = w1, . . . , wnh] and b = b1, . . . , bNh

. Since w and b are chosenrandomly, eq.(3.26) is reduced to∥∥H(w, bi)β −Y

∥∥ = minβ∥∥H(w, b)β −Y

∥∥ (3.27)

Solution of eq.(3.27) is given by

β = H†Y (3.28)


where H† is the Moore-Penrose generalized inverse of matrix H [20]. The βgiven by eq.(3.28) is the unique least square solution of eq.(3.22) and alsohas the smallest norm of weights which means

∥∥∥β∥∥∥ =∥∥∥H†Y∥∥∥ ≤ ‖β‖ ,∀β ∈ β :

∥∥Hβ −Y∥∥ ≤ ∥∥Hz−Y

∥∥ ,∀z ∈ RN×Nh

(3.29)According to [21], the generalization performance of feedforward neural

networks that reach the small training error, the smaller the norm of weightsis, the better the generalization performance of the network. Since moore-penrose generalized inverse gives the least norm solution [20], ELM has goodgeneralization performance.

ELM Algorithm

Given the training data xk, ykNk=1 where x ∈ Rn and y ∈ Rm, activationfunction h and number of hidden nodes Nh, the ELM algorithm can bewritten as [3]:

1. Assign the weights wi and threshold bi randomly.

2. Calculate H.

3. Calculate β using eq.(3.28)

4. The output for a new input x is given by f(x) = h(x)β where

h(x) = [h(w1x+ b1) . . . h(wNhx+ bNh

)] , w and b are from step 1

Chapter 4

Case Study - Spot PriceForecasting

4.1 Characteristics of Electricity Prices

The characteristics of electricity that makes it different from other commodi-ties are [16] :

1. A real time commodity: Electricity is a real time commodity whichmeans that it must be consumed at the same time it is generated.Any imbalance in the production and consumption will lead to devia-tion in frequency and this will affect the stability of the grid. Higherproduction than consumption will increase the frequency and lowerproduction than consumption will decrease the frequency.

2. Non storable good: It is not possible to store electricity. Options tostore electricity exists at a small scale(e.g. batteries) but it is difficulton a large scale. The pricing of a forward contract depends on thestorage costs. Hence, the pricing of electricity contracts cannot bedone similar to commodities that can be stored.

3. Characteristics of demand and supply: Electricity is a essentialcommodity which makes the demand of electricity inelastic. Supply isdecided by the merit order curve of the power plants. If the demandis low, it can be met by base load generators. With the increase inrenewable infeed, the supply curve has changed significantly.

Due to different nature of electricity as compared to other commodities,spot price of electricity is difficult to predict. Also, since electricity cannotbe stored, the pricing of electricity forward contracts cannot be done on thebasis of the classic formula of forward pricing:

Ft = (S0 + U) exp−r(T−t)

21

22 CHAPTER 4. CASE STUDY - SPOT PRICE FORECASTING

where Ft is the Forward price,S0 is the spot price at t = 0 and U is the storagecosts with interest rate r and time to maturity T − t. The characteristics ofelectricity prices are described below [16]:

1. Multi Scale Seasonality: Electricity prices show seasonal patterns.It show intra-day, weekly and monthly seasonal cycles (hence the namemulti-scale seasonality). During a day, prices are higher at noon andduring evening because of high economic activity. Also, weekdays havehigher prices than weekends due to more demand. The price level alsovary with months. During winter, there is high heating requirementthat drives the electricity prices higher.

2. Dependency on External Factors: Electricity prices also dependon external factors like temperature and load. The prices closely fol-lows the trends of the load profile. High load increases the price.

3. Mean reversion: The mean reversion property means that the spotprices tend to move towards an average value. The variations in thespot prices are assumed to be temporary and the modeling of prices isdone using stochastic approach.

4. Jumps: Electricity prices are can move to high levels in very short pe-riod of time. Since electricity cannot be stored, any extreme event(e.g.a plant outage) can drive the prices higher in a short time interval.

4.2 Methods for Price forecasting

4.2.1 Importance and need of price forecasting

The introduction of deregulated electricity markets has bought the need tostudy the electricity industry in more detail. On the generation side, manyprivate players have entered. Power exchanges have been set up to facilitateelectricity trading which has led to the entry of trading companies. Theretail and distribution side has also been privatized. The residential sectoris also opening slowly. The structural change in the electricity industryhas given rise to many risks and the spot price is one of the basic elementfor managing these risks. The markets are still not full liberalized. Thisfurther increase the need for price forecasting. Also, the technical issuesrelated with electricity makes it very difficult for the markets to set up theelectricity prices. Fundamentally, the prices are governed by the marginalcosts of the generators, congestion in the transmission grid and other gridsecurity issues, fuel prices, demand for electricity and policies set up bythe regulator. The lack of liquidity in the future markets of electricity alsostresses the need to forecast the prices.

4.2. METHODS FOR PRICE FORECASTING 23

4.2.2 Modeling of Electricity Prices

The choice of selecting a modeling method for price depends on the appli-cation of the model. Long term price forecasting is used in the investmentdecisions and power system planning. Short term price forecasting is mainlyused for the day ahead planning of generation schedule or by trading com-panies for bidding purposes. For example, accurate forecasting will help thegenerators to plan the generation schedule with the aim of profit maximiza-tion. For long term price forecasting, the model must be able to capturethe fundamental elements that form the prices like costs of various genera-tion technologies, emission constraints, demand, congestion in grid etc. Forshort term price forecasting, the model should be able to capture the statis-tical properties of the electricity prices over time. Accordingly, the modelingapproach for price forecasting is divided in two categories [13]:

1. Fundamental models: Fundamental models are the optimizationproblems that aims to determine the marginal cost of every technol-ogy/power plant in a specified region. The market price is determinedby the merit order curve and the demand of the electricity. The mostcommon inputs to a fundamental model are

• regional demand

• power plant cost curves

• plant data like retirement

• fuel prices

• emissions constraints

• transmission constraints

These input are used to form a cost optimization problem consideringthe energy constraint, reserve capacity requirement, transmission con-straints and unit chronological constraints. The resulting optimizationproblem can be solved using an optimization solver and the results arethe marginal costs of every power plant considered in the optimizationequation. These can further be used to find out the cash flows, crossborder flows, transmission capacity available and emissions.

2. Quantitative models: The quantitative models are used to find outthe statistical characteristics of the spot prices with the main aim ofrisk management [11]. The ultimate motive is not to give an accu-rate number to price value but to find out the relative movement ofthe prices over short term time horizons. Volatility of spot prices isthe most important variable for risk managers. Stochastic DifferentialEquations (SDE) are typically used to characterize the properties ofspot prices like random walk, mean reversion and jumps. A few simplemodels based on SDE are enumerated below [16]:


Figure 4.1: Combining Fundamental and Quantitative Approach for priceforecasting [1]

• Random Walk: In this model, prices are assume to follow ran-dom walk. The SDE for a random walk model is:

dPt = µPtdt+ σPtdWt

Here, the prices are modelled as a Geometric Brownian Motion.Pt denotes the spot price, Wt is a Weiner process, µ is a driftterm, σ is the volatility and t is time

• Mean Reversion: In addition to random walk of the prices, thismodel also captures the mean reversion property of electricityprices. The SDE for a mean reversion model is:

dPt = a(µ− Pt)dt+ σPtdWt

Here, a is the speed of mean reversion

• Mean Reversion with jumps: The mean reversion model canbe extended to explain the jumps seen in the price time series asfollows:

dPt = a(µ− Pt)dt+ σPtdWt + kdqt

Here, kdqt represents the jumps with k representing the severityof the jump and qt represents the frequency of the jumps. For adetailed explanation of these models, refer [16]

In [1], authors have proposed to combine the two approaches. The argu-ment is that since the markets are not mature and due to low liquidity, it is

4.2. METHODS FOR PRICE FORECASTING 25

necessary to look for additional data and hence the modeling can be mademore comprehensive if the fundamental approach and quantitative approachare combined. Fig. 4.1 explains the approach. The bottom-up approach in-cludes making a model based on considerations of the fundamental structureof the system (here electricity spot prices). The fundamental model requiresexternal data (price depend on factors like temperature and load). The Fun-damental model can be combined with financial approach that is based onmodel building using stochastic differential equations. The combination offundamental approach and financial approach is used to build price scenar-ios.

Chapter 5

Model Representation andEstimation

5.1 Time Series Analysis

To build a model for time series forecasting, a three step procedure isfollowed[22]:

1. Identification- In this step, the data is analyzed to find out the char-acteristics of time series. For example, time series might have signifi-cant auto-correlation. Autocorrelation factor can be used to find outthe suitable lag factors to be used in the modeling. Also, dependenceof time series on external factors should be analyzed.

2. Estimation- Based on the information obtained in the first step, amodel is proposed. This model should be able to explain all the under-lying characteristics like moving average, auto-correlation, dependencyon external factors etc.

3. Diagnostic Checking- This step involves applying statistical teststo verify if the model is able to capture all the dynamics of the timeseries. One of the ways is to check the autocorrelation of the residuals.There should not be any significant degree of auto-correlation amongthe residuals. If the residuals show some correlation, it means that themodel is not able to capture all the dynamics of the time series.

Fig.5.1 shows a sample spot price time series and the steps involved inthe forecasting.

5.2 Short Term Spot Price Model

The spot price profile is defined as

yt = yt + εt (5.1)

27

28 CHAPTER 5. MODEL REPRESENTATION AND ESTIMATION

Figure 5.1: Steps in time series forecasting

where yt is the spot price with average yt and εt is a white noise processwith zero expectation and a finite variance i.e. ε ∈ (0, ϑ2). The model forthe hourly spot price forecasting is

yt = yt + χt (5.2)

Here, yt is the estimated spot price with estimated average yt and χt is awhite noise process with zero expectation and a finite variance i.e. χt ∈(0, ϑ2). The estimated average is written as the function of a regressionvector xt ∈ Rn

yt = f(xt) (5.3)

Characteristics of Spot Prices

Fig. 5.2 shows the autocorrelation of the hourly spot price series for Ger-many taken from EPEX website for the month of March 2012. It showsthat the spot prices are significantly correlated with the previous hour val-ues. The blue lines show the 95% confidence bounds. Any value of auto-correlation above these bounds is termed significant and must be explainedby the proposed model.

Fig.5.3 shows the relation between electricity price and load. The profileof load and price follows the same pattern. When load increases, priceincreases and when it decreases, price decreases. Fig. 5.4 show the relationbetween temperature and price for a summer day. Low temperature implieslow price due to less cooling requirement.

5.2. SHORT TERM SPOT PRICE MODEL 29

0 5 10 15 20−0.2

0

0.2

0.4

0.6

0.8

Lag

Sam

ple

Aut

ocor

rela

tion

Autocorrelation of a sample spot price time series

Figure 5.2: Autocorrelation of the EPEX prices for March 2012

0 20 40 60 80 100

0.4

0.5

0.6

0.7

0.8

0.9

1

Nor

mal

ized

Loa

d an

d E

lect

ricity

Pric

es

hours

Electricity PriceLoad

Figure 5.3: Correlation of electricity prices and load


0 20 40 60 80 100

0.4

0.5

0.6

0.7

0.8

0.9

1

Nor

mal

ized

tem

pera

ture

and

ele

ctric

ity p

rices

hours

Electricity PriceMean Temperature

Figure 5.4: Correlation of electricity prices and temperature

Fig.5.5 shows the multi-scale seasonality of the spot prices. The houraxis shows the 168 hours of a week. It starts from Monday. The week axisshow the weeks of 4 months. It starts from February. Looking along thehour axis, spot price are high during weekdays and low during weekends.Also, a day has two peaks. The economic activity is high during the weekand hence the price are high. Intra-day peaks are explained by the relativelyhigh consumption of electricity at noon and during evening. Along the weekaxis, February shows high prices due to high heating requirement. Pricesdecrease during the summer as there is no heating required.

Model for spot prices

The regression vector should be able to capture the multi-scale seasonalityand dependency on external factors of the spot prices. It should also beable to explain the strong autocorrelation observed in the price series. Inthe proposed model, the evolution of spot prices is explained by the value ofthe spot prices of the previous hours, a set of exogenous variables and a setof dummy variables (fig.5.6). The exogenous variables include previous yearload data and weather variables, dummy variables are used for seasonality.All of them are described below:

1. Weather variables include maximum temperature Tmaxt , minimum tem-perature Tmint , mean temperature , wind speed W s

t and precipitationPPt. They can be grouped together as Wt

Weat =[Tmaxt Tmint Tmeant W s

t PPt]

5.2. SHORT TERM SPOT PRICE MODEL 31

050

100150

200

05

1015

20

−50

0

50

100

150

200

250

hoursweeks

Pric

e in

eur

o/M

Wh

Figure 5.5: Seasonality of electricity prices

2. To capture the monthly, weekly and intraday seasonality, determinis-tic binary values are used for computational simplicity [10]. A dayis represented by a vector Dt ∈ 0, 17. for example, Sunday is[1 0 0 0 0 0 0

]. Similarly, hours are represented by vectors

Ht ∈ 0, 124 and months by Mt ∈ 0, 112

3. The load include the vertical load data (taken from ?) for the pasttwo years and is represented by Lt ∈ R2

The regression vector can be written as

xt = yt−1, . . . , yt−j ,Weat, Ht, Dt,Mt, Lt

j is the amount of lag that we want to include in the model. In total, thereare j(lag)+5(weather)+24(hours)+7(day)+12(month)+2(load) = j+50variables. The proposed model is called a NARX model (fig.5.6).

5.2.1 Forecast Accuracy Measures

To check the model for accuracy, following statistical measures of error areused:

1. Mean Absolute Error(MAE): Mean Absolute Error is an indicatorto find out how close the results are to the actual values. It is definedas follows:

MAE =

∑Nt=1

∣∣yt − y∣∣N


Figure 5.6: NARX model for electricity spot prices

2. Mean Absolute Percentage Error(MAPE): Mean Absolute Per-centage Error is another accuracy measure for time series forecasts.The accuracy of the model is determined by the absolute value of thepercentage errors. It is defined as follows:

MAPE =1

N

N∑t=1

∣∣∣∣ yt − yyt

∣∣∣∣

Chapter 6

Empirical Analysis- PriceForecasting

In this chapter, the results of the spot price model are presented. Thecode for the model is written in MATLAB and CVX [19] is used as theoptimization solver.

6.1 Spot Price Model

0 500 1000 1500 2000 2500 3000−50

0

50

100

150

200

250

euro

/MW

h

hours

Spot price series (Mean=43.96, Std=17.94)

Figure 6.1: Spot price time series for Germany(Feb, 2012 to May, 2012)

The spot price model presented in the previous chapter is applied to thespot price time series for Germany taken from the EPEX website [23]. Fig.6.1 shows the spot price time series and following points describes the dataused:

33

34 CHAPTER 6. EMPIRICAL ANALYSIS- PRICE FORECASTING

• The spot prices are from February 2012 to May 2012.

• The resolution of the prices is one hour.

• Unless stated otherwise, the length of the insample data is 70% of thetraining data. The rest of the training data is used for the out ofsampling predictions to find out the preformance of the model. Thelength of the both insample and out of sample data is rounded off tothe nearest multiple of 24.

• The vertical load data is taken from TSO website [26]

• All the data for weather variables come from Bloomberg.

1 29 57 86 114 143 171 200

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

autocorrelation of the price time series

Lag Length

Figure 6.2: Autocorrelation of the Spot price time series

Fig. 6.2 shows the autocorrelation plot for the price time series till lag of200. The two blue lines above and below the x-axis show the 95% confidencebounds. The prices show a high degree of autocorrelation. In the model,the amount of lag is kept as a variable and the performance indicators arethen used to find out the best choice of the lag value. The model is testedfor different lags(1 day, 2 day and 4 day) and forecasting days (1 day, 3 daysand 5 days).

6.2. LSSVMAND ELM - SIMULATION RESULTS FOR SPOT PRICES35

6.2 LSSVM and ELM - Simulation results for SpotPrices

6.2.1 Parameter Selection for LSSVM

The LSSVM method described in eq.(3.19) is applied to the spot price model.LSSVM model requires tuning of parameters γ (eq.(3.17)) and σ (gaussiankernel parameter). 10-fold crossvalidation is used for tuning (γ, σ). γ con-trols the error term in the optimization problem of LSSVM and σ controlsthe shape of the Gaussian kernel. The algorithm for m-fold crossvalidationis explained previously. The grid for selecting (γ, σ) is a 25×20 grid of differ-ent combination of elements of vector (21, 22, . . . , 225) and (21, 22, . . . , 220).Fig. 6.3 - Fig. 6.5 shows the corssvalidation scores for 1 day lag, 2 day lagand 4 day lag. The value of MAPE depends on the choice of (γ, σ). Also,the amount of lags affect the MAPE results which indicate that differentlags need to be tested for better results.

05

1015

20

0

10

20

30

0.16

0.18

0.2

0.22

0.24

0.26

0.28

GammaSigma

MA

PE

Figure 6.3: Cross validation scores(1 day lag)

6.2.2 Training Results

Fig. 6.6 shows the results of LSSVM model on the training data. In thehours 0− 200, LSSVM method is able to capture the information from thehigh prices and hence the actual spot price (blue points) are covered by theLSSVM model (red line). In other cases, for example, from hours 600−1200,LSSVM model does not cover the actual spot price. This is expected becausethe prices are not high in this region and hence there is no information topredict the spikes. It also shows that the LSSVM method does not sufferfrom over-fitting. Matching of the training results of LSSVM with spikes in


05

1015

20 05

1015

2025

0.1

0.15

0.2

0.25

0.3

0.35

GammaSigma

MA

PE


05

1015

20

010

2030

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

GammaSigma

MA

PE



0 200 400 600 800 1000 1200 1400 1600 1800 2000−50

0

50

100

150

200

250

hours

euro

/MW

h

function estimation using LS−SVMγ=88.7925,σ2=53678.8552

RBF datapoints (blue .), and estimation (red line)

Figure 6.6: LSSVM training results

the region that does not contain high prices shows over-fitting. In regions,where the prices do not contain spikes, the LSSVM model is fully able tocapture the trends in the time series. Fig. 6.7 shows the training results ofELM. As compared to LSSVM, ELM results show overfitting. In the hours0 − 200, ELM is able to capture the information similar to LSSVM but inhours 0− 1200, ELM is able to capture all the peaks even though the pricesin the surrounding region are not high. This is because ELM is able toobtain zero error on the in-sample data (theorem 1) if the number of hiddenneurons are greater than the length of in-sample data. This enables ELMto capture all the peaks in the training data and causes overfitting.

0 200 400 600 800 1000 1200 1400 1600 1800 2000−50

0

50

100

150

200

250ELM results (Mean=46.430362, Std=19.469965)

euro

/MW

h

hours

Figure 6.7: ELM training results


6.2.3 Forecasting Performance

The forecasting performance of LSSVM and ELM is tested on in-sampledata and out of sample data. The forecasting horizon is 1 day, 3 days and 5days. Only the forecasting for 5 days is presented here. For the rest of thecases, please see Appendix 1.

In-Sample

Fig. 6.8 shows the forecasting of LSSVM and ELM on in-sample data.The forecasting is done on a one step ahead basis. The forecasted value ofthe previous hour is used to update the input vector (autoregressive terms)and then the updated input vector is used to forecast the price in the nexthour. In-sample prediction by LSSVM model is able to capture the spotprice characteristics as illustrated by the low value of MAPE (for Fig. 6.8,MAPE is 0.0483). In general, LSSVM model performs good on the in sampledata. A typical daily profile of spot prices has two peaks- one in afternoonand other in evening. LSSVM is able to capture both the peaks as shownin Fig. 6.8. ELM also gives a low error in in-sample forecasting. It isable to capture the intra-day peaks as well. Both LSSVM and ELM areable to capture all the profile variations of the price in in-sample data (priceestimation surrounding hour 80 in fig. 6.8). All the peaks are also forecasted.

0 20 40 60 80 100 12020

40

60

80

100

120

140

160

180

200

220

hours

euro

/MW

h

ELMSVMActual

Figure 6.8: LSSVM In Sample fit - 5 Day forecast

Out of Sample

Fig. 6.9 shows the out of sample simulations. The out of sample simulationhas higher MAPE than the in-sample fit for both LSSVM and ELM. Theerror increases as the horizon of forecasting is increased. However, intra-dayseasonality is successfully captured by out of sample simulations for both


0 20 40 60 80 100 12020

40

60

80

100

120

140

hours

euro

/MW

h

ELMSVMActual

Figure 6.9: LSSVM Out of Sample fit - 5 Day forecast

LSSVM and ELM. In-sample simulations are better than out of sample incapturing the spikes. For the 5 day-ahead out of sample forecasting, MAPEfor LSSVM is 0.144 and for ELM is 0.403. This is because ELM shows betterfitting on in-sample data. Overfitting on insample data causes the ELM togive a less accurate forecast than LSSVM on out of sample data. BothLSSVM and ELM are not able to capture the small profile variations of spotprice. This is expected because the model aims to capture the characteristicsof spot price based on seasonality and external factors. For the small profilevariations, more information is required.

6.2.4 Residual Analysis

Analysis of residuals is the last step of time series modeling. A model canbe validated by performing a few statistical tests on the residuals. Residualsshould be checked for any autocorrelation and assumption of white noisewith zero mean and finite variance should also be verified. The residuals areanalyzed by following methods:

Autocorrelation of residuals

Fig. 6.10 and fig. 6.11 show the autocorrelation of the residuals for LSSVMand ELM. The blue lines show the 95% confidence bounds. Any value out-side these bounds show a significant autocorrelation. For LSSVM, only lag2 show significant values. For ELM, lag 2 and 3 show significant values. Allthe other values for different lags are within the bound represented by theblue lines. This validates the model. If the residuals are found to be auto-correlated, it means that the model is not able to capture all the dynamicsof the time series. In such a case, the model must be changed. One way to


0 2 4 6 8 10 12 14 16 18 20−0.5

0

0.5

1

Lag

Sam

ple

Aut

ocor

rela

tion

autocorrelation of prediction error for out of sample data

Figure 6.10: Autocorrelation of Out of Sample residuals for LSSVM

change the NARX models if the residuals are found to be autocorrelated isto use AR-NARX model[2]. AR-NARX model include terms for autocorre-lated residuals and hence can correct the missing dynamics of the time seriesin the results of NARX model.

Histogram

Eq. (8.1) assumes that the prices can be described by an average value andan error term. The error term is assumed to be white noise. To validatethis, histogram is used. The results for LSSVM and ELM are shown infig.6.12 and fig.6.13. The normal distribution is shown by red line. Forboth LSSVM and ELM, the assumption of white noise does not hold. Thedistribution of residuals do not fit to the normal distribution. To overcomethis, bootstrapping [24] can be used to report the results of spot prices alongwith residuals.

6.3 Forecast Accuracy Analysis

Table 6.1 and Table 6.2 show the complete results for 1 day ahead, 3 dayahead and 5 day ahead forecast for both LSSVM and ELM. The resultsare simulated for 1 day, 2 day and 4 day lag. Every simulation is reportedwith three figures: MAPE, MAE and Standard deviation of error. Resultsare given for in-sample and out of sample simulations. LSSVM gives betterresults for 4 day lag as compared to 3 day lag. The magnitude of MAPEincreases from 1 day ahead to 5 day ahead forecasting. Also the MAE of

6.3. FORECAST ACCURACY ANALYSIS 41

0 2 4 6 8 10 12 14 16 18 20−0.5

0

0.5

1

Lag

Sam

ple

Aut

ocor

rela

tion

autocorrelation of prediction error for out of sample data

Figure 6.11: Autocorrelation of Out of Sample residuals for ELM

−10 −5 0 5 10 150

1

2

3

4

5

6

7

8prediction error for out of sample data (Mean=2.94,Std=3.82,MAPE=0.0969,MSE=22.6082)

No

of e

rror

s

errors

Figure 6.12: Histogram of Out of Sample residuals for LSSVM


−15 −10 −5 0 5 10 15 20 25 30 350

1

2

3

4

5

6

7

8prediction error for out of sample data (Mean=9.57,Std=7.33,MAPE=0.2429,MSE=143.1494)

No

of e

rror

s

errors

Figure 6.13: Histogram of Out of Sample residuals for ELM

LSSVM is low for 4 day lag as comparison to 1 day and 2 day lag.For in-sample simulations of ELM, MAPE for 1 day is comparable to 4

day lag (lower for 1 day ahead and 3 day ahead but higher for 4 day ahead).For out of sample simulations of ELM, MAPE for 4 day lag is lower than 1day lag. So, 4 day lag seems to be better choice than 1 day and 2 day lag.Also, ELM shows low MAPE than LSSVM for in-sample simulations but forout of sample simulations, the MAPE of LSSVM is lower than ELM.

6.3. FORECAST ACCURACY ANALYSIS 43

1 day ahead forecast

1 day lag 2 day lag 4 day lag

LSSVM ELM LSSVM ELM LSSVM ELM

MAPE 0.196 0.039 0.167 0.366 0.048 0.099MAE 18.2623 3.4595 15.2421 34.2326 4.8743 9.4666Std of Error 21.922 3.223 17.779 27.139 6.11 8.472









Table 6.1: Model Performance - Insample













Table 6.2: Model Performance - Out of Sample


6.4 Transition Case

0 100 200 300 400 500 6000

50

100

150

200

250

euro

/MW

h

hours

Figure 6.14: In-Sample spot prices for transition case

In this section, LSSVM and ELM are simulated on a transition period.Fig. 6.14 shows that data used to train the LSSVM and ELM. It con-tains highly volatile price series. The price vary from 210 euro/MWh to 20euro/MWh. The simulation is done for price shown in fig. 6.15. The pricesvary from 25 euro/MWh to 85 euro/MWh. This will help in comparingthe performance of LSSVM and ELM for a transition period (e.g.- extremeevents like grid failure).

Fig.6.16 and fig. 6.17 show the in-sample and out of sample simulationresults. Exact values for the forecast accuracy measures (MAPE, MAE andStd of Error) are given in Table 6.3 and 6.4. LSSVM and ELM results showsimilar characteristics to the results presented earlier. For 4 day lag, LSSVMworks better than ELM for out of sample forecasting. The relative value ofMAPE for both LSSVM and ELM are of the same magnitude as that ofresults presented in the non transition case. This proves that LSSVM andELM produce good results for transition periods also.

6.5 LSSVM and ELM - Execution Time

Main bottleneck for the LSSVM algorithm is the tuning of the parame-ters. LSSVM requires the error parameters and any kernel parameter to betuned before forecasting. The parameters are tuned by cross-validation. Ascross-validation is computationally intensive, the time taken for tuning theparameters is large. Fig. 6.18 and fig. 6.19 shows the MATLAB profilerresults for LSSVM and ELM.

6.5. LSSVM AND ELM - EXECUTION TIME 45

0 20 40 60 80 100 120 14020

30

40

50

60

70

80

90

euro

/MW

h

hours

Figure 6.15: Out of Sample spot prices for transition case

0 20 40 60 80 100 12020

40

60

80

100

120

140

160

180

200

220

hours

euro

/MW

h

ELMSVMActual

Figure 6.16: LSSVM and ELM performance for transition case (in-sample)


0 20 40 60 80 100 12020

40

60

80

100

120

140

hours

euro

/MW

h

ELMSVMActual

Figure 6.17: LSSVM and ELM performance for transition case (out of sam-ple)













Table 6.3: Model Performance - Insample transition case

6.5. LSSVM AND ELM - EXECUTION TIME 47













Table 6.4: Model Performance - Out Sample transition case

Figure 6.18: MATLAB profiler results for LSSVM


Figure 6.19: MATLAB profiler results for ELM

Profiler gives the running time taken by all the functions used in thecode. Function crossvallinsvm implements the crossvalidation for LSSVM.It takes 22329 seconds to execute. For ELM there is no parameter to betuned. The number of hidden neurons is a parameter for ELM but if thenumber of hidden neurons is of same magnitude as the length of the train-ing data, the results do not depend on the number of hidden neurons [3].Function trainelmmoore implements the ELM and it takes just 1.6 seconds.Hence, ELM is much faster than LSSVM.

Part II


49

Chapter 7


Figure 7.1: Constrained Estimation

In forecasting, it is sometimes desirable to put some constraints duringthe estimation step. Fig. 7.1 shows non-constrained estimation and con-strained estimation. In non-constrained estimation, a model is proposed fora given time series and it is trained using training data and predictions aremade. This approach is used in the first part of the thesis. In constrainedestimation, constraints are added at the time of training the model as shownin fig. 7.1. The model is trained such that it fits the training data as well assatisfy the constraints. This will help in making predictions that will followthe constraints. For example, consider the forecasting of PV infeed. Basedon a empirical study that relates the feed-in tariff (FIT) and investmentin PV industry, it is possible to define a set of constraints that relate thepossible changes in FIT to the amount of PV infeed. These constrains canbe incorporated in the model at the time of training and then predictions

51

52 CHAPTER 7. CONSTRAINED ESTIMATION

can be made. This will result in a model that fits the training data and atthe same time satisfy the constraints also. This part of the thesis explainsthe constrained estimation problems under the framework of Support VectorMachines and Extreme Learning Machines.

7.1 SVM and ELM for constrained estimation

To apply the non-linear estimation algorithms to constrained estimation,Support Vector Machines and Extreme Learning Machines are proposed inthis chapter. For using the support vector regression theory, it is proposedto use the SVM with random feature spaces [5] and for using ELM theory,an optimization based ELM is used[8]. The constrained estimation problemcan be written as:

f(x) = wTφ(x) + b (7.1)

such that wTφ(x?) ≤ Λ (7.2)

Here, f(x) is the function to be estimated, x is the input vector, φ is a highdimensional mapping similar to the estimation problem discussed previouslyand (w, b) are the parameters of the model. The given constraint shows justan example of the possible constraints. It means that during prediction thevalue of the function should not be more than Λ for inputs x?.

7.2 SVM with external constraints

To solve the estimation problems with constraints, it is required to includethe external constraints during the formulation of the optimization problemof SVM. Consider the optimization problem of a standard SVM as given ineq.(3.4). To solve a time series estimation problem with external constraintsusing SVM theory, the optimization problem is modified along the lines ofeq. (7.2) as


1

2wTw + c

N∑k=1

(ξk + ξ?k)

such that yk − wTφ(xk)− b ≤ ε+ ξk, k = 1, . . . , N

wTφ(xk) + b− yk ≤ ε+ ξ?k, k = 1, . . . , N

ξk, ξ?k ≥ 0, k = 1, . . . , N

wTφ(xτ ) + b ≤ ψ, τ ∈ S, S ⊆ [N + 1, . . . ,M ] (7.3)

Eq.(7.3) is written for a time series defined by a training set xk, ykNk=1,τ is the future time for which we want to add constraints and M ≥ N + 1.For other parameter, please refer eq.(3.4). One way to solve the primalproblem of eq.(7.3) is to convert it in the dual problem as described next.

7.2. SVM WITH EXTERNAL CONSTRAINTS 53

7.2.1 Solving the dual problem

The Lagrangian of the primal problem in eq.(7.3) is

L(w, b, ξ, ξ?;α, α?, η, η?, γ) =1

2wTw + c

N∑k=1

(ξk + ξ?k)

−N∑k=1

αk(ε+ ξk − yk + wTφ(xk) + b)−N∑k=1

α?k(ε+ ξ?k + yk − wTφ(xk)− b)

−N∑k=1

(ηkψk + η?kψ?k) +

M∑τ=N+1

γτ (wTφ(xτ ) + b− ψ) (7.4)

The conditions of optimality give

∂L∂w

= 0→ w =

N∑k=1

(αk − α?k)φ(xk)−M∑

τ=N+1

γτφ(xτ )

∂L∂b

= 0→N∑k=1

(−αk + α?k) +

M∑τ=N+1

γτ = 0

∂L∂ξ

= 0→ c− αk − ηk = 0

∂L∂ξ?

= 0→ c− α?k − η?k = 0 (7.5)

The dual problem can be written using eq.(7.5) and eq.(7.4)

maxα,α?,γJd =−1

2

N∑k,l=1

(αk − α?k)(αl − α?l )K(xk, xl)

N∑k=1

M∑τ=N+1

(αk − α?k)γτK(xk, xτ )

− εN∑k=1

(αk + α?k) +N∑k=1

yk(αk − α?k)−M∑

τ=N+1

γτψ

such that

N∑k=1

(−αk + α?k) +

M∑τ=N+1

γτ = 0

αk, α?k ∈ [0, c] (7.6)

Using the value of w from eq.(7.5), the estimated function from eq.(3.1)can be written as following

f(x) =N∑k=1

(αk − α?k)K(x, xk)−M∑

τ=N+1

γτK(x, xτ ) + b


To solve the constrained estimation with the dual problem, it is requiredto write explicitly the dual problem every time more constraints are addedor the current constraints are modified. Also, it is also not possible to writethe dual problem in terms of kernel functions for every type of constraints.Next, SVM with random feature spaces is described that makes it possibleto solve the optimization problem without explicitly converting to dual formand hence kernel function need not be defined. 1

7.2.2 Solving with the random feature space

Figure 7.2: Random Feature Space based SVM for constrained estimation

Frenay et al.[5] has proposed a way to merge the ELM and SVM ap-proaches by defining a new method to explicitly define the feature space.This feature space is called random feature space as the parameters used todefine it can be selected randomly. In ELM, the input vectors are mappedto the hidden layer neurons by a randomly generated matrix[3]. This isanalogous to defining a new feature space where the hidden layer acts as a

1Random feature space can be related to a kernel known as ELM kernel[5]

7.3. ELM WITH EXTERNAL CONSTRAINTS 55

transformation from the input vector space to the hidden neurons space. So,for example, the feature space can be defined as following for a sigmoidalfunction

φi(xk) =1

1 + exp(−wi.xk − b), i = 1, . . . , h

φ(xk) =[φ1(xk) φ2(xk) . . . φh(xk)

](7.7)

The mapping φ(.) : Rn → Rh takes the input vector xk ∈ Rn to the h-dimensional space Rh, where h is the dimension of the high dimensionalfeature space. In [25], Liu also proposes to use explicitly defined featurespaces to form an Extreme Support Vector Machine (ESVM). The optimiza-tion problem in eq.(7.3) can now be solved without kernels as the featuremapping is known. Knowing the feature mapping enables to write the ex-ternal constraints in the optimization problem rather than solving the dualproblem that utilizes the kernels but needs to be formulated every time theconstraints are changed. In [5], ELM kernel has been introduced based onrandom feature spaces that can enable to use the Fixed Size SVM approachin case of large data sets. The ELM kernel[5] is defined as

k(xk, xl) =1

pφxkφxl

where φ is defined as in equation 7.7. Fig.7.2 shows the SVM method forconstrained estimation.

7.3 ELM with external constraints

The SVM output can be written as

f(x) =

Ns∑s=1

(αs − α?s)K(x, xs) + b (7.8)

Eq.(7.8) suggests that SVM can be compared to a generalized single-hidden layer feed forward networks [8]. Note that eq. (7.8) is same aseq.(3.12) where s denotes the support vectors. α is zero for all but supportvectors. A comparision of eq. (7.8) and eq. (3.22) suggests that kernelk(x, xs) is comparable to activation function h(x) and the Lagrangian factors(αs−α?s) are comparable to output weights β. So, there can be a possibilityfor combining ELM and SVM[8]. In [8], Huang et al. has suggested twoways to combine SVM and ELM theory:

• using random kernels

• using a optimization based ELM

For this thesis, optimization based ELM is more suitable as it will helpto include a set of external constraints. For details on kernel based ELM,please refer to [8] and [9].


7.3.1 Optimization based ELM

Consider the training data xk, ykNk=1 where x ∈ Rn and y ∈ Rm. TheELM as presented in eq.(3.28) obtains zero training error based on theorem1. The ELM based on the Moore-Penrose generalised inverse is a solutionto the following problem:

minimizeN∑i=1

‖βh(xi)− yi‖

and

minimize ‖β‖ (7.9)

The Moore-Penrose generalized inverse is a least square solution to eq.(7.9).It is possible to formulate ELM as an optimization problem with an errorbound as presented below[8], [9].

minβ,ξ1,ξ2 : Lp =1

2‖β‖2 + C

N∑i=1

(ξ1 + ξ2)

such that:h(xi)β ≤ yi − ξ1, i = 1, . . . , N

h(xi)β ≥ yi − ξ2, i = 1, . . . , N

ξ1, ξ2 ≥ 0, i = 1, . . . , N (7.10)

The ELM presented in eq.(7.10) prevents possible over-fitting as it as-sumes error bounds and has the possibility to improve the generalizationperformance[8]. The activation function is formed using random w and b asdescribed earlier. So, in this case, it is possible to include a set of externalconstraints as described in next section.

Optimization based ELM with external constraints

The optimization based ELM provides a way to include some sets of externalconstraints. As already explained, in time series forecasting, it is highlydesirable to have the possibility of including external constraints. Fromhere on, the optimization based ELM is called ELM-variant [8]. The ELM-variant with a set of external constraints is described below.

minβ,ξ1,ξ2 : Lp =1

2‖β‖2 + C

N∑i=1

(ξ1 + ξ2)

such that:h(xi)β ≤ yi − ξ1, i = 1, . . . , N

h(xi)β ≥ yi − ξ2, i = 1, . . . , N

ξ1, ξ2 ≥ 0, i = 1, . . . , N (7.11)

h(xτ )β ≤ ςτ , τ ∈ S,S ⊆ [N + 1, . . . ,M ] (7.12)

7.4. RESULTS FOR A ARTIFICIAL KNOWN PROCESS 57

τ is the future time for which we want to add constraints and M ≥ N + 1.Since activation function can be formed by using random w and b, the opti-mization problem can be extended to include external constraints as givenin eq. (7.12). Fig. 7.3 shows the ELM-variant with external constraints.

Figure 7.3: ELM-variant with external constraints

7.4 Results for a artificial known process

The random feature space based SVM and ELM-variant is applied to aartificial process in this section. The function to be estimated is:

y = (x2 − 1)x4 exp(−x)

such that φ(xtest) ≥ L.B.L.B. ≤ φ(xtest) ≤ U.B. (7.13)

L.B. means lower bound and U.B means upper bound.

7.4.1 Random feature space based SVM

SVM is applied to the function of eq. (7.13). The feature space is chosenexplicitly as a sigmoidal function and the parameters (w, b) are chosen ran-


domly. The dimension of the feature space is predetermined. The dimensionof feature space does not affect the results as long as it is of same magnitudeas the size of input data. Fig. 7.4 shows the result of SVM without anyconstraints.

−1 −0.5 0 0.5 1−0.5

0

0.5

1

1.5

2

x

y

original fnSVM result

Figure 7.4: SVM results (Out of Sample)

Constraint: φ(xtest) >= 0

Figure 7.5 shows the results of SVM for this constraint. In fig.7.4 the SVMestimates negative values. This constraint direct the optimization solver tofind (w, b) such that for the points in the testing data xtest, the function isalways greater than zero.

Constraint: 0 <= φ(xtest) <= 1.4

Figure 7.6 shows the results of SVM for this constraint. In fig.7.5, themaximum estimated value is more than 1.4. This constraint put a upperbound on the estimated value such that for the points in the testing dataxtest, the function is always greater than zero but less than 1.4 .

7.4.2 ELM variant

ELM -variant is applied to the function in eq.(7.13). Fig.7.7 show the resultsof ELM-variant without any constraints.

Constraint: φ(xtest) >= 0

Figure 7.8 shows the results of ELM variant for this constraint. This con-straint direct the optimization solver to find (w, b) such that for the points


−1 −0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

x

y


Figure 7.5: SVM results for constraint 1(Out of Sample)

−1 −0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

x

y


Figure 7.6: SVM results for constraint 2(Out of Sample)

in the testing data xtest, the function is always greater than zero.

Constraint: 0 <= φ(xtest) <= 1.5

Figure 7.9 shows the results of ELM variant for this constraint. In fig.7.8,the maximum estimated value is more than 1.5. This constraint put a upperbound on the estimated value such that for the points in the testing dataxtest, the function is always greater than zero but less than 1.5 .


−1 −0.5 0 0.5 1−0.5

0

0.5

1

1.5

2

x

y

original fnELM result

Figure 7.7: ELM results for constraint 1(Out of Sample)

−1 −0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

x

y




−1 −0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

x

y



Chapter 8

Case Study - PV InfeedForecasting

8.1 Photovoltaic Infeed forecast model

The underlying equation for the PV infeed forecast is same as the Spot Priceforecast. The PV infeed can be written as:

yt = yt + εt (8.1)

where yt is the PV infeed with average yt and εt is a white noise processwith zero expectation and a finite variance i.e. ε ∈ (0, ϑ2). The model forthe hourly PV infeed is

yt = yt + χt (8.2)

Here, yt is the estimated PV infeed with estimated average yt and χt is awhite noise process with zero expectation and a finite variance i.e. χt ∈(0, ϑ2). The estimated average is written as the function of a regressionvector xt in Rn

yt = f(xt) (8.3)

8.2 Characteristics of PV infeed

Photovoltaics infeed depend on a variety of factors. The amount of diffusedradiation received on earth is correlated to temperature of the region. It canalso be correlated to the precipitation. Additionally, PV time series of twogiven years shows significant degree of cross correlation. PV infeed showmulti scale seasonality also. All these characteristics are discussed below indetail:

63

64 CHAPTER 8. CASE STUDY - PV INFEED FORECASTING

Autocorrelation

Fig.8.1 shows the cross-correlation between PV infeed for 2012(March-June)and 2011(March-June). The blue lines show 95% confidence bounds. Anyvalue of cross correlation factor outside t he blue lines show a significantdegree of cross-correlation.

−20 −15 −10 −5 0 5 10 15 20−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

Lag

Sam

ple

Cro

ss C

orre

latio

n

Cross Correlation Function of PV infeed for 2011 and 2012

Figure 8.1: Cross correlation of PV infeed time series for 2011 and 2012

Multi-Scale Seasonality

PV time series shows different types of seasonality. Fig. 8.2 shows the PVinfeed for 4 months in a 3-d view. The PV infeed is plotted for every hourand arranged by different weeks. The hour axis show 168 hours of the week.The hour axis shows the intra-day seasonality. In-feed starts with zero,increases as the sun increases the elevation angle and then decreases to zeroagain in night. The week axis show the PV infeed over a period of 4 monthsstarting from March to June. The amount of in-feed increases as the monthchange from March to June and summer approches.

Temperature

PV in-feed depend on temperature also. Temperature of a region can give anindication of the amount of diffused radiation received from the Sun duringa given day. So, the correlation between PV infeed and temperature can beused to build a model for PV in-feed. Fig. 8.3 shows the relation betweenPV in-feed and temperature.

8.2. CHARACTERISTICS OF PV INFEED 65

0 20 40 60 80 100 120 140 160 1800

5

10

15

20

0

5000

10000

weeks

hours

PV

inne

ed (

MW

)

Figure 8.2: Multi Scale Seasonality of PV infeed

0 50 100 150 200 250 300 350 400−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Nor

mal

ized

tem

pera

ture

and

PV

infe

ed

hours

PV infeedMean Temperature

Figure 8.3: Correlation of PV infeed and Mean Temperature


8.3 Model for PV infeed

Based on the analysis of time series of PV, a NARX model is proposed.NARX models are able to explain the autoregressive components and theeffect of external factors. The input vector for the NARX model of PVconsists of:

1. The Autoregressive part: The PV infeed is regressed with a windowconsisting of previous year PV infeed values. The value of lag is keptas a variable.

2. The external inputs: The external inputs taken in the input vectorare:

• Weather variables include maximum temperature Tmaxt , mini-mum temperature Tmint , mean temperature Tmeant , heating daysHdt, cooling days Cdt and precipitation PPt. They can be groupedtogether as Weat

Weat =[Tmaxt Tmint Tmeant Hdt Cdt PPt

]• To capture the monthly and intraday seasonality, deterministic

binary values are used due to their simplicity[10]. Hours are rep-resented by vectors Ht ∈ 0, 124 and months by Mt ∈ 0, 112

The regression vector can be written as

xa,t = ya−1,t−1, . . . , ya−1,t−j ,Weat, Ht,Mt

j is the amount of lag that we want to include in the model and a is theyear.

8.4 Implementation

The PV infeed model is applied to the PV time series for Germany takenfrom the Tennet website [26]. Fig. 8.4 shows the PV time series and followingpoints describes the data used in detail:

• The data is from March to June for both 2012 and 2011.

• The resolution of the in-feed is one hour.

• Unless stated otherwise, the length of the insample data is 70% of thetraining data. The rest of the training data is used for the out ofsampling predictions to find out the preformance of the model. Thelength of the both insample and out of sample data is rounded off tothe nearest multiple of 24.

8.5. SVM RESULTS 67

0 500 1000 1500 2000 2500 30000

1000

2000

3000

4000

5000

6000

7000

8000

9000

PV

infe

ed (

MW

)

hours

PV infeed(March 2012−Jun 2012)

Figure 8.4: PV infeed time series from March 2012 to Jun 2012

• All the data for weather variables come from Bloomberg.

PV infeed model is implemented in MATLAB and CVX [19] is used as theoptimization solver.

8.5 SVM results

8.5.1 SVM with random feature space without constraints

Fig.8.5 shows the training results of SVM. The SVM is fully able to capturethe information in the time series. Blue lines show the trained values andred line show the actual data. Every blue point lies on the red line whichshows good fitting of the data.

Fig. 8.6 show the results for insample data and Fig.8.7 show out ofsample data. There are no constraints in these results. In-sample dataresults are able to match the profile of infeed accurately as expected. Intra-day peak is captured successfully. For out of sample simulation, the resultsare not as close to the actual in-feed. However, intra day seasonality iscaptured successfully in this case also. PV infeed should be zero duringnight, evening and early morning. The estimation of PV during these hoursshould give zero values. In sample simulations give values very close to zerobut out of sample simulations gives non zero values. For some hours, out ofsample in-feed is negative also.

8.5.2 SVM with random feature space with constraints

Two sets of constraints are applied on the PV time series at the time oftraining the model. The aim of the constraints is to check if the model can


0 100 200 300 400 500 600−0.2

0

0.2

0.4

0.6

0.8

1

1.2

hours

Nor

mal

ized

PV

infe

ed

Figure 8.5: Training results (SVM with random feature space,without con-straints)

0 10 20 30 40 50 60 70 80 90 100−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45SVM results

hours

PV

infe

ed

Actual InfeedSVM result

Figure 8.6: In Sample results (SVM with random feature space,withoutconstraints)

8.5. SVM RESULTS 69

0 10 20 30 40 50 60 70 80 90 100−0.2

0

0.2

0.4

0.6

0.8

1

1.2SVM results

hours

PV

infe

ed

Actual infeedSVM result

Figure 8.7: Out of Sample results (SVM with random feature space,withoutconstraints)

be trained to make non-negative estimation and an upper cap on the valueof thr PV in-feed.

Constraint 1:

Following constraints were applied on future data:

0 ≤ wTφ(xτ ) + b ≤ 0.8, τ ∈ [10, . . . , 20]

0 ≤ wTφ(xτ ) + b ≤ 1, τ ∈ [1, . . . , 96] (8.4)

where τ is the future time and is relative to the end of insample time. So,[10, . . . , 20] means [t+ 10, . . . , t+ 20], t is the length of the insample timeseries data and so on.

Fig. 8.8 show the results for in-sample data and Fig. 8.9 show out ofsample data for constraint 1. Comparison of fig. 8.8 and fig. 8.6 shows thatThe in sample results are not affected by the addition of constraints. Out ofsample results show considerable improvement in terms of removing negativein-feed. Also, the constraint of upper limit on the PV in-feed is satisfied forhours 10-20. However, out of sample results are affected significantly in theregion where the PV in-feed is zero.


0 10 20 30 40 50 60 70 80 90 100−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45SVM results

hours

PV

infe

ed

Actual InfeedSVM result

Figure 8.8: In Sample results (SVM with random feature space,with con-straints 1)

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1SVM results

hours

PV

infe

ed


Figure 8.9: Out of Sample results (SVM with random feature space,withconstraints 1)

8.6. ELM RESULTS 71

Constraint 2: Following constraints were applied on future data:

0 ≤ wTφ(xτ ) + b ≤ 0.8, τ ∈ [10, . . . , 20]

0 ≤ wTφ(xτ ) + b ≤ 0.6, τ ∈ [30, . . . , 40]

0 ≤ wTφ(xτ ) + b ≤ 0.65, τ ∈ [55, . . . , 65]

0 ≤ wTφ(xτ ) + b ≤ 0.4, τ ∈ [80, . . . , 90]

0 ≤ wTφ(xτ ) + b ≤ 1, τ ∈ [1, . . . , 96] (8.5)

Fig.8.10 show out of sample simulation for the second set of constraints.The second set of constraints is more strict than the first set in the sensethat an upper limit is applied to all the peaks of PV in-feed. In this casealso, the negative values have been removed and the upper limit is satisfiedfor all peaks but the problem of non-zero infeed, when the actual infeed iszero, remains.

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9SVM results

hours

PV

infe

ed


Figure 8.10: Out of Sample results (SVM with random feature space,withconstraints 2)

8.6 ELM results

This section describes the results of ELM-variant for the PV model. Onlythe out of sample results for constrained estimation are given here. Forcomplete results, please refer Appendix 2.


8.6.1 ELM-variant with constraints


0 ≤ H(xτ )β ≤ 0.8, τ ∈ [10, . . . , 20]

0 ≤ H(xτ )β ≤ 1, τ ∈ [1, . . . , 96] (8.6)

where τ is the future time and is relative to the end of insample time. So,[10, . . . , 20] means [t+ 10, . . . , t+ 20], t is the length of the insample timeseries data and so on. Fig. A.7 and Fig.8.12 show the results .

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ELM results

hours

PV

infe

ed

Actual infeedELM result

Figure 8.11: Out of Sample results (ELM with constraints 1)


0 ≤ H(xτ )β ≤ 0.8, τ ∈ [10, . . . , 20]

0 ≤ H(xτ )β ≤ 0.6, τ ∈ [30, . . . , 40]

0 ≤ H(xτ )β ≤ 0.65, τ ∈ [55, . . . , 65]

0 ≤ H(xτ )β ≤ 0.4, τ ∈ [80, . . . , 90]

0 ≤ H(xτ )β ≤ 1, τ ∈ [1, . . . , 96] (8.7)

where τ is the future time and is relative to the end of insample time. So,[10, . . . , 20] means [t+ 10, . . . , t+ 20], t is the length of the insample timeseries data and so on. For both of the out of sample simulations of theELM-variant, the constraints are satisfied. There is no negative infeed andthe upper limit is satisfied. But the problem of non-zero in-feed similar tothe SVM results remains.

8.6. ELM RESULTS 73

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7ELM results

hours

PV

infe

ed


Figure 8.12: Out of Sample results (ELM with constraints 2)

Chapter 9

Conclusion

In the first part of the thesis, Support Vector Machine and Extreme Learn-ing Machine are presented to train the models for time series forecasting inelectricity markets. A NARX model is developed for spot price forecasting.The electricity spot prices show autocorrelation and multi-scale seasonality.Furthermore, prices are also correlated with weather parameters and aggre-gate load profile. The model is trained using LSSVM and ELM. LSSVMapproach required tuning of parameters which is done using 10-fold cross-validation. The simulation is done for 1-day ahead, 3-day ahead and 5-dayahead forecasting. The amount of lag to capture the autocorrelation is alsovaried for better accuracy. 1-day lag, 2-day lag and 4-day lag is used to simu-late the spot prices. For finding the accuracy of forecast, two error estimatesare used: MAPE and MAE. For in-sample simulation, ELM works betterthan LSSVM but LSSVM outperforms ELM for out of sample simulations.For the model, 4-day lag gives the best performance. Both LSSVM andELM give good performance results for transition periods also. ELM worksmuch faster than LSSVM as it does not require any tuning of parameters.

In the second part of the thesis, constrained estimation is presented. Themodel is subjected to external constraints and then trained using sampledata. This ensures that the prediction can follow possible future develop-ments. It is proposed to use the random feature spaces based SVM to solvethe constrained estimation problem. Explicit construction of feature spacesmakes it easy to include the external constraints in the optimization soft-ware. Converting the primal problem to dual problem is not required. Thedual form has the possible advantage of using kernel functions but it needto be reformulated every time the constrains are modified which is a tedioustask. Also, the if the feature spaces do not appear in dot product form,the dual problem also requires explicit construction of feature space. Hence,random feature space based SVM is a better solution technique for con-strained estimation. The ELM theory is applied to constrained estimationproblem using the optimization based ELM. ELM based on moore-penrose

75

76 CHAPTER 9. CONCLUSION

generalized inverse is a least square solution and hence external constraintsare difficult to implement. Optimization based ELM makes it easy to includethe constraints in the optimization solver. Both SVM and ELM are appliedto PV in-feed forecast subjected to external constraints. The constraints areused to remove the negative in-feed of PV and to cut the peaks of the PVin-feed during the day. SVM and ELM produce good performance resultson both sets of constraints.

Outlook

Large data sets

The tuning of parameters of SVM poses a challenge if one wants to useSVM for large data sets. LSSVM method can be applied to large data setsunder the framework of fixed-size LSSVM [2]. In fixed-size LSSVM, the highdimensional mapping is estimated separately and the optimization problemof SVM is solved in primal space. The support vectors need to be chosen firstand this method can overcome the problem with computationally intensivestandard LSSVM.

Interval forecast

Spot prices forecasts can be extended to include interval forecast. This canhelp in making price scenarios and also put a confidence interval on the pointforecasts.

Analysis of residuals in constrained estimation

The question of validation of models in constrained estimation should beaddressed. Without constraints, the model can be validated by analyzingthe residuals. When the model is subjected to constraints, the residuals aredifferent than the no constraint case.

Effect of constraints

Adding constraints at the time of training can lead to infeasible or incom-plete optimization problem. A error catching mechanism should be designedto report any constraints that lead to such problems.

Appendix A

LSSVM and ELM completeresults

0 5 10 15 20 2540

60

80

100

120

140

160

hours

euro

/MW

h

LSSVM results(in−sample,MeanLSSVM

=74.34,StdLSSVM

=24.35,Meanepex

=78.95,Stdepex

=29.60)

LSSVM forecastEPEX prices

Figure A.1: LSSVM In Sample fit - 1 Day forecast

77

78 APPENDIX A. LSSVM AND ELM COMPLETE RESULTS

0 10 20 30 40 50 60 70 8040

60

80

100

120

140

160

180

200

220

hours

euro

/MW

h

LSSVM results(in−sample,MeanLSSVM

=82.30,StdLSSVM

=30.16,Meanepex

=85.35,Stdepex

=35.92)


Figure A.2: LSSVM In Sample fit - 3 Day forecast

0 5 10 15 20 2530

35

40

45

50

55

hours

euro

/MW

h

LSSVM results(out of sample, MeanLSSVM

=45.10,StdLSSVM

=5.58,Meanepex

=42.16,Stdepex

=5.73)


Figure A.3: LSSVM Out of Sample fit - 1 Day forecast

79

0 10 20 30 40 50 60 70 8030

35

40

45

50

55

60

65

70

hours

euro

/MW

h

LSSVM results(out of sample, MeanLSSVM

=43.59,StdLSSVM

=7.50,Meanepex

=43.78,Stdepex

=9.40)


Figure A.4: LSSVM Out of Sample fit - 3 Day forecast

0 5 10 15 20 2520

40

60

80

100

120

140

160

hours

euro

/MW

h

ELM results(in−sample,Meanelm

=69.48,Stdelm

=21.86,Meanepex

=78.95,Stdepex

=29.60)

ELM forecastEPEX prices

Figure A.5: ELM In Sample fit - 1 Day forecast


0 10 20 30 40 50 60 70 8020

40

60

80

100

120

140

160

180

200

220

hours

euro

/MW

h

ELM results(in−sample,Meanelm

=74.25,Stdelm

=29.19,Meanepex

=85.35,Stdepex

=35.92)


Figure A.6: ELM In Sample fit - 3 Day forecast

0 5 10 15 20 2530

35

40

45

50

55

60

65

hours

euro

/MW

h

ELM results(out of sample, Meanelm

=51.74,Stdelm

=9.43,Meanepex

=42.16,Stdepex

=5.73)


Figure A.7: ELM Out of Sample fit - 1 Day forecast

81

0 10 20 30 40 50 60 70 8020

30

40

50

60

70

80

hours

euro

/MW

h

ELM results(out of sample, Meanelm

=52.12,Stdelm

=12.15,Meanepex

=43.78,Stdepex

=9.40)


Figure A.8: ELM Out of Sample fit - 3 Day forecast

Appendix B

Complete Results forConstrained Estimation

0 100 200 300 400 500 600−0.2

0

0.2

0.4

0.6

0.8

1

1.2

hours

Nor

mal

ized

PV

infe

ed

Figure B.1: ELM-variant training results

83

84APPENDIX B. COMPLETE RESULTS FOR CONSTRAINED ESTIMATION

0 10 20 30 40 50 60 70 80 90 100−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45ELM results

hours

PV

infe

ed


Figure B.2: In-sample results for PV in-feed(ELM, no constraint)

0 10 20 30 40 50 60 70 80 90 100−0.2

0

0.2

0.4

0.6

0.8

1

1.2ELM results

hours

PV

infe

ed


Figure B.3: Out of sample results for PV in-feed(ELM, no constraint)

85

0 2 4 6 8 10 12 14 16 18 20−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Lag

Sam

ple

Aut

ocor

rela

tion

Autocorrelation of residuals(In Sample)

Figure B.4: Autocorrelation of in-sample residuals(SVM, no constraint)

0 2 4 6 8 10 12 14 16 18 20−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Lag

Sam

ple

Aut

ocor

rela

tion

Autocorrelation of residuals(Out of Sample)

Figure B.5: Autocorrelation of out of sample residuals(SVM, no constraint)


0 2 4 6 8 10 12 14 16 18 20−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Lag

Sam

ple

Aut

ocor

rela

tion


Figure B.6: Autocorrelation of in-sample residuals(SVM, constraint 1)

0 2 4 6 8 10 12 14 16 18 20−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Lag

Sam

ple

Aut

ocor

rela

tion


Figure B.7: Autocorrelation of in-sample residuals(SVM, constraint 2)

87

0 2 4 6 8 10 12 14 16 18 20−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Lag

Sam

ple

Aut

ocor

rela

tion


Figure B.8: Autocorrelation of out of sample residuals(SVM, constraint 1)

0 2 4 6 8 10 12 14 16 18 20−0.5

0

0.5

1

Lag

Sam

ple

Aut

ocor

rela

tion


Figure B.9: Autocorrelation of out of sample residuals(SVM, constraint 2)

Bibliography

[1] Stein-Erik Fleten and Jacob Lemming. Constructing forward pricecurves in electricity markets. Energy Economics, 25:409–424, 2003.

[2] J. A. K. Suykens, T. Van Gestel, J. de Brabanter, B. De Moor, andJ. Vandewalle. Least Square Support Vector Machines. World ScientificPublishing Co. Pte. Ltd., 2002.

[3] Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew. Extreme learn-ing machine: Theory and applications. Neurocomputing, 70(13):489 –501, 2006.

[4] Marcus Hildmann, Evdokia Kaffe, Yang He, and Goran Andersson.Combined estimation and prediction of the hourly price forward curve.IEEE Power and Energy Society General Meeting, 2012.

[5] Benoit Frenay and Michel Verleysen. Using svms with randomisedfeature spaces: an extreme learning approach. In ESANN 2010proceedings, European Symposium on Artificial Neural Networks-Computational Intelligence and Machine Learning., April 2010.

[6] Qiuge Liu, Qing He, and Zhongzhi Shi. Extreme support vector machineclassifier. PAKDD, pages 222–233, 2008.

[7] J. A. K. Suykens and J. Vandewalle. Training multilayer perceptronclassifiers based on a modified support vector method. IEEE transac-tions on Neural Networks, 10:907–911, 1999.

[8] Guang-Bin Huang, Xiaojian Ding, and Hongming Zhou. Optimizationmethod based extreme learning machine for classification. Neurocom-puting, 74(1a3):155–163, December 2010.

[9] Guang-Bin Huang, Xiaojian Ding, Hongming Zhou, and Rui Zhang.Extreme learning machine for regression and multiclass classification.IEEE transactions on systems, man and cybernetics, 42(2), April 2012.

[10] Marcelo Espinoza, J. A. K. Suykens, Ronnie Belmans, and Bart Demoor. Electric load forecasting. IEEE Control Systems Magazine,27:43–57, 2007.

89

90 BIBLIOGRAPHY

[11] Rafal Weron and Adam Misiorek. Forecasting spot electricity prices:A comparison of parametric and semiparametric time series models.MPRA, (10428), 2008.

[12] Adam Misiorek, Stefan Trueck, and Rafal Weron. Point and intervalforecasting of spot electricity prices: Linear vs. non-linear time seriesmodels. Studies in Nonlinear Dynamics and Econometrics, 10, 2006.

[13] Dr. Aurelio Fetz. Fundamental aspects of power markets - price fore-casting. Part of Strommarkt-II lectures at ETH Zurich, Spring 2012,29 February 2012.

[14] Marcelo Espinoza, Tillmann Flack, Johan A. K. Suykens, and Bart DeMoor. Time series prediction using ls-svms. ESTSP, 2008.

[15] Marcelo Espinoza, Johan A.K. Suykens, and Bart De Moor. Ls-svm re-gression with autocorrelated errors. IFAC Symposium on System Iden-tification, 14, 2006.

[16] Gustaf Unger. Hedging Strategy and Electricity Contract Engineering.PhD thesis, ETH Zuerich Diss No. 14727, 2002.

[17] S. Haykin. Neural Networks: A Comprehensive Foundation. PrenticeHall New Jersey, 1999.

[18] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag,1995.

[19] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cam-bridge University Press, 2011.

[20] C.R.Rao and S.K. Mitra. Generalized Inverse of Matrices and its Ap-plications. Wiley New York, 1971.

[21] Peter L. Bartlett. The sample complexity of pattern classification withneural networks: The size of the weights is more important than thesize of the network. IEEE TRANSACTIONS ON INFORMATIONTHEORY, 44(2), March 1998.

[22] George E. P. Box and Gwilym M. Jenkins. Time Series Analysis fore-casting and control. Holden-Day Inc., 1976.

[23] EPEX European Power Exchange. www.epex.com, last accessed July2012.

[24] Prof. Peter Buhlmann. Computational Statistics. Seminar for Statistik,ETH Zuerich, Spring 2008.

BIBLIOGRAPHY 91

[25] Qiuge Liu, Qing He, and Zhongzhi Shi. Extreme support vector machineclassifier. PAKDD, pages 222–233, 2008.

[26] TenneT TSO Network Figures. www.tennettso.de, last accessed July2012.

[27] Guang-Bin Huang and Lei Chen. Convex incremental extreme learningmachine. Neurocomputing, 70(16a18):3056–3062, October 2007.

[28] Stanley Weardon Shirley Dowdy and Daniel Chilko. Basics of Statistics- Statistics for Research. Wiley Seriesin Probability and Statistics, 2004.

[29] Gene H. Golub, Michael Heath, and Grace Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Techno-metrics, 21:215–223, 1979.

photovoltaic output prediction using support vector machines

Documents

estimation of time series

nonlinear estimation

machine learning algorithm

learning algorithms

spotprice time series

support vector machines

power systemlaboratories

extremelearning machine