wind power forecasting algorithms and application · 2019. 11. 30. · kernel density estimation...

Wind Power Forecasting Algorithms and Application

Statistics Seminar – Toulouse School of Economics

Ricardo Bessa ([email protected]) 2011 DEC,13

Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics 2

Talk Overview

• Introduction to the wind power forecasting problem

• Information Theoretic Learning for Wind Power Point Forecast

• Time-adaptive Quantile-Copula for Wind Power Uncertainty Forecast

• Application: Setting the operating reserve (very brief presentation)

• Avenues for future research

3

Introduction

• Wind power forecasting is vital for two groups of end-users

– system operators → managing the power system

– wind power producers → bidding in the electricity market

• Synergy between three research areas

– Meteorology (physical): numerical weather predictions

– Statistics (mathematical): time series and data mining models

– Power systems (forecast value): decision-making problems with forecasts as input

• It is a business

– several companies sell wind power forecasts as a service

• Prewind Lda, spin-off company from three institutes in Portugal

– other companies sell forecasting systems

Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics

4

Exhibit A: Nonlinear Conversion of Wind Speed to Power

the non-linear part amplifies the wind speed

forecast error

Power Curve

wind speed error dist. wind power dist.


Power Curve

5

Exhibit B: Noisy Data

Power curve with forecasted wind speed


6

Exhibit C: Evolving Structure of Data

• Continuous stream of measured data from the wind farms

• Changes in the data (or concept drift)

– limitation of the maximum produced energy

– new wind farm in the same region

– maintenance of wind farms

– loss in performance

– changes in the wind speed prediction model

– etc…

• Time-adaptive models are needed!


7

Statistical and Physical Forecasting Models

Physical Statistical

Hybrid: Physical + Statistical


8

Very Short-term Wind Power Forecasting (~6 hrs ahead)

Wind Speed Forecasting Wind Power Forecasting

Kalman Filter Adaptive Linear Models

Grey Predictor Fuzzy Time Series

Takagi-Sugeno Self-exciting Threshold Autoregressive

Discrete Hilbert Transform Smooth Transition Autoregressive

Abductive Networks (GMDH) Markov-switching Autoregressive

Adaptive Fuzzy Logic Models

Adaptive Linear Models

ARIMA time series models

Neural Networks Adaptive Neural Fuzzy Inference System


SCADA: supervisory control and data

acquisition

past values (t-1,t-2,…)

9

Short-term Wind Power Forecasting (~72hrs ahead)

Methods

Neural Networks Support Vector Machines

Regression Trees with Bagging Random Forests

Adaptive Neural Fuzzy System

Mixture of Experts Nearest Neighbor Search

Autoregressive with Exogenous input (ARX)

Locally Recurrent Neural Networks

Local Polynomial Regression Takagi-Sugeno FIS

Fuzzy Neural Networks Bayesian Clustering by Dynamics (BCD)


NWP: Numerical Weather Predictions (e.g. Meteo France)

10

Model Chain for Forecasting

NWP Point Forecasts

Probabilistic Model

Probabilistic Forecasts

NWP Ensemble

Probabilistic Model



R.J. Bessa, C. Monteiro, V. Miranda, A. Botterud, J. Wang, and G. Conzelmann, “Wind power forecasting: state-of-the-art 2009,” Report ANL/DIS-10-1, Argonne National Laboratory, 2009. http://www.dis.anl.gov/projects/windpowerforecasting.html

NWP Point Forecasts

Wind Power Point Forecast Model

Probabilistic Model


11

Information Theoretic Learning for Wind Power Forecast

NWP Point Forecasts


Probabilistic Model



R.J. Bessa, V. Miranda, and J. Gama, “Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting,” IEEE Transactions on Power Systems, vol. 24, no. 4, pp. 1657-1666, Nov. 2009

R.J. Bessa, V. Miranda, A. Botterud, and J. Wang, “‘Good’ or ‘bad’ wind power forecasts: a relative concept,” Wind Energy, vol. 14, pp. 625-636, Jul. 2011

12

Information Theoretic Learning for Wind Power Forecast

• Wind power forecast errors are non-Gaussian

– Mean Square Error (MSE) is only optimal under Gaussian distribution

• The Information Theoretic Learning (ITL) idea…

– The ideal case is when the pdf of errors ε is a Dirac function - all errors equal (of the same value)

– with all errors equal, one could have perfect matching between output O and target T, by adding a bias term

e

d (e)

errors

pdf

-

= d 1 ) e ( ε=T-O


13

Correntropy as a Cost Function

• Correntropy is a generalized similarity measure between Target (T) and Output (O)

• MCC (Maximum Correntropy Criterion) → max Jσ → maximizing the pdf of errors at the origin

=

-=N

i

ii otGN

OTJ1

,1

,

O

T

z

can be interpreted as the probability that T = O in a joint

space defined by σ p(T=O)


G: Gaussian kernel

14

Correntropy Induced Metric (CIM)

• Contours of CIM (X,0) - distance to the origin in a 2d space (for σ = 1)

• Small errors: Euclidian (behaves like MSE)

• Medium size errors: Manhattan Large errors: indifference

-8.0 -6

.0 -4.0 -2

.0 0.0 2

.0 4.0 6

.0 8.0

-8,0

-6,8

-5,6

-4,4

-3,2

-2,0

-0,8

0,4

1,6

2,8

4,0

5,2

6,4

7,6


15

Results for a Real Wind Farm with a Neural Network

0.0000

0.0001

0.0002

0.0003

0.0004

0.0005

-1 -0.5 0 0.5 1

Normalized Error [% of rated power]

MSE

MCC

10

12

14

16

18

20

22

NM

AE

[% o

f ra

ted

po

we

r]

MSE

MCC higher frequency of errors close to zero

Not Gaussian


16

ILT based Training: Remarks

• ITL criteria (correntropy in particular) presented better results than MSE, in terms of

– higher frequency of errors close to zero

– insensitivity to outliers

– best fit of the predictions to the actual values verified

• ITL criteria cannot be ignored when building robust wind power prediction models

• The ITL criteria were used in neural networks, however it can be applied in other statistical and machine learning methods


17

Wind Power Uncertainty Forecast

NWP Point Forecasts

Probabilistic Model


NWP Point Forecasts


Probabilistic Model



R.J. Bessa, V. Miranda, A. Botterud, J. Wang and Z. Zhou, “Time-adaptive quantile-copula for wind power probabilistic forecasting,” Renewable Energy, vol. 40, pp. 29-39, April 2012.

18

Why Kernel Density Forecast?

• The information provided by point forecasts is unsatisfactory for some decision-making problems

• Modeling of wind power uncertainty is “distribution free”

• Uncertainty characterized by the full distribution

• Fast method for uncertainty modeling

• High flexibility to represent uncertainty (e.g. density func., quantiles, mass func.)

• Captures multimodal pdf


19

Kernel Density Estimation (KDE)

𝑓 𝑦|𝑋 = 𝑥 =𝑓𝑋𝑌 𝑥, 𝑦

𝑓𝑋 𝑥

Conditional KDE: estimating the density of a r.v. Y, knowing that the explanatory r. v.

X is equal to x Joint or multivariate density function

of X and Y

Marginal density of X

𝑓 𝑋 𝑥 =1

𝑁∙

1

ℎ𝑖∙ 𝐾

𝑥 − 𝑋𝑖

ℎ

𝑁

𝑖=1

𝐾 𝑥 =1

𝑁∙

1

2𝜋∙ 𝑒

−𝑥−𝑋𝑖

2

2∙ℎ2

𝑁

𝑖=1


20

Quantile-Copula Estimator

Copula Definition multivariate distribution function separated in: • marginal functions • dependency structure between the marginals,

modeled by the copula

copula density function

KDE ESTIMATOR

Ui=FXe(Xi) and Vi=FY

e(Yi)

𝐹𝑋𝑌 𝑥, 𝑦 = 𝐶 𝐹𝑋 𝑥 , 𝐹𝑌 𝑦

𝑓 𝑥, 𝑦 =𝜕2

𝜕𝑢 ∙ 𝜕𝑣∙ 𝐶 𝑢, 𝑣 = 𝑓𝑋 𝑥 ∙ 𝑓𝑌 𝑦 ∙ 𝑐 𝑢, 𝑣

𝑓 𝑦|𝑋 = 𝑥 =𝑓𝑋 𝑥 ∙𝑓𝑌 𝑦 ∙𝑐 𝑢,𝑣

𝑓𝑋 𝑥= 𝑓𝑌 𝑦 ∙ 𝑐 𝑢, 𝑣

𝑓 𝑌 𝑦 =1

𝑁∙

1

ℎ𝑖∙ 𝐾

𝑦 − 𝑌

ℎ𝑦

𝑁

𝑖=1

𝑐 𝑢, 𝑣 =1

𝑁 𝐾

𝑢 − 𝑈𝑖

ℎ𝑢∙ 𝐾

𝑣 − 𝑉𝑖

ℎ𝑣

𝑁

𝑖=1

KDE ESTIMATOR

Fe: empirical cum. dist.


21

Quantile-Copula Estimator

• Advantages

– methods based on the Nadaraya-Watson are numerically unstable when the denominator is close to zero

– for a problem with several explanatory variables, this method has only one kernel product, instead of two

– purely based on density estimation methods

– gives the full pdf for the wind power uncertainty

𝑓 𝑦|𝑋 = 𝑥 =1

𝑁 ∙ ℎ𝑦∙ 𝐾𝑦

𝑦 − 𝑌𝑖

ℎ𝑦

𝑁

𝑖=1

∙1

𝑁∙ 𝐾𝑢

𝐹𝑋𝑒 𝑢 − 𝐹𝑋

𝑒 𝑈𝑖

ℎ𝑢

𝑁

𝑖=1

∙ 𝐾𝑣

𝐹𝑋𝑒 𝑣 − 𝐹𝑋

𝑒 𝑉𝑖

ℎ𝑣

O.P. Faugeras, “A quantile-copula approach to conditional density estimation,” Journal of Multivariate Analysis, vol. 40, no. 1, pp. 2083-2099, Oct. 2009.


22

Time-adaptive Estimator

𝑓 𝑛 𝑥 =𝑛 − 1

𝑛∙ 𝑓 𝑛−1 𝑥 +

1

𝑛 ∙ ℎ𝑖∙ 𝐾

𝑥 − 𝑋𝑖

ℎ𝑖

𝑓 𝑛 𝑥 = 𝜆 ∙ 𝑓 𝑛−1 𝑥 +1 − 𝜆

ℎ𝑖∙ 𝐾

𝑥 − 𝑋𝑖

ℎ𝑖

forgetting factor 𝜆 =

𝑛

𝑛 + 1

Recursive Estimator

Exponential Smoothing


for stationary data streams, and 1/n aprox. 0 when 𝑛 → ∞

for nonstationary data streams

23

Time-adaptive Quantile-Copula Estimator

𝐹𝑒 𝑥 𝑡 = 𝜆 ∙ 𝐹𝑒 𝑥 𝑡−1 + 1 − 𝜆 ∙ Ι 𝑥𝑖 ≤ 𝑥

time-adaptive empirical cumulative distribution function

𝑓 𝑦|𝑥 = 𝑋 𝑡 = 𝑓 𝑡 𝑦 ∙ 𝑐 𝑡 𝑢, 𝑣

𝑓 𝑡 𝑦 = 𝜆 ∙ 𝑓 𝑡−1 𝑦 + 1 − 𝜆 ∙ 𝐾ℎ

𝑦 − 𝑌𝑖

ℎ

𝑐 𝑡 𝑢, 𝑣 = 𝜆 ∙ 𝑐 𝑡−1 𝑢, 𝑣 + 1 − 𝜆 ∙ 𝐾1

𝐹𝑋𝑒 𝑢 − 𝐹𝑋

𝑒 𝑈𝑖

ℎ𝑞∙ 𝐾2

𝐹𝑋𝑒 𝑣 − 𝐹𝑋

𝑒 𝑉𝑖

ℎ𝑞


time-adaptive conditional KDE

24

Formulation of the Wind Power Forecast Problem

Forecast the wind power pdf at time step t for each look-ahead time step t+k of a given

time-horizon knowing a set of explanatory variables (NWP forecasts, wind power

measured values, hour of the day)

0

0.2

0.4

0.6

0.8

1

0

5

10

15

20

Win

d S

pee

d (m

/s)

Wind Power (p.u.)

𝑓 𝑃 𝑝𝑡+𝑘|𝑋 = 𝑥𝑡+𝑘|𝑡 =𝑓𝑃,𝑋 𝑝𝑡+𝑘, 𝑥𝑡+𝑘|𝑡

𝑓𝑋 𝑥𝑡+𝑘|𝑡


25

Quantile-Copula Estimator vs Classical Multivariate KDE

u=Fv(V) [wind speed]

0.0

0.2

0.4

0.6

0.8

1.0

z=Fp(P

) [P

ow

er]

0.0

0.2

0.4

0.6

0.8

1.0

Cop

ula

De

nsity

: c(u

,z)

1

2

3

4

5

Wind Speed (m/s)

0

5

10

15

20

Pow

er (p

.u.)

0.0

0.2

0.4

0.6

0.8

1.0

Join

t De

nsity

Fun

ction 0.0

0.1

0.2

0.3

0.4

0.5


26

Kernel Choice

• Depends on the variable data and type

• In this problem we have two different types – Variables with range [0,1]: wind power and quantiles transforms u and v

• beta kernels

• circular variables: hour of the day and the wind direction – Von Mises distribution

Wind Power [p.u.]

De

nsity

0.00 0.15 0.30 0.45 0.60 0.75 0.90

05

10

15

20

25

30

Wind Power [p.u.]

De

nsity

-0.1 0.1 0.3 0.5 0.7 0.9 1.1

05

10

15

20

Beta Kernel Gaussian Kernel, for var. with ]-∞,+∞[ density.circular(x = data, bw = 1, kernel = "vonmises")

N = 5000 Bandwidth = 1

De

nsi

ty c

ircu

lar

0

90

180

270

+

k=0.1

k=1

k=5

k=30

k=60

Von Mises dist.


27

Macro-beta Kernel

• The integrals computed from the beta kernels may

– lead to distributions that do not have an integral equal to 1

– the kernel is also inconsistent for distributions that are point mass at 0% and 100%

• This is due to lack of normalization, and the idea is a modified beta kernel estimator (named “macro-beta”)

𝑓 ′ 𝑥 =𝑓 𝑥

𝑓 𝑥 𝑑𝑥1

0

normalization is employed over the conditional function


28

Case-Study and Evaluation Metrics

• NREL dataset: wind power generation of 15 sites

• U.S. Midwest wind farm – 42 hrs ahead forecasts

– NWP produced by a GFS + WRF model chain

• Benchmark algorithms – Splines quantile regression (or additive quantile regression)

– Nadaraya-Watson KDF estimator

• Evaluation metrics – Calibration: deviation between forecasted and estimated probabilities, e.g.

the 85% quantile should contain 85% of obs. values lower or equal to its value • Primary requirement measure by the deviation to perfect calibration

– Sharpnes: mean size of the forecast intervals (target: narrow intervals)

– Skill score: combined information about the predictor’s performance in a single measure (less negative the better, 0 for perfect probabilistic forecasts)


29

Proof of Concept – Data with Artificial Change

NREL dataset: test dataset Oct-Dec, connection of 2 sites after Oct


Nominal proportion rate [%]

De

via

tio

n [

%]

5 15 25 35 45 55 65 75 85 95

-10

-50

51

0

Offline

Lambda=0.999

Lambda=0.995

Lambda=0.99

quantiles underestimation

quantiles overestimation

30

Evaluation Resuls – Offline Version


De

via

tion

[%

]

5 15 25 35 45 55 65 75 85 95

-20

24

68

QC

NW

SplinesQR

Calibration


31

Evaluation Resuls – Offline Version

Look-ahead Time [hours]S

kill

Sco

re6 9 12 16 20 24 28 32 36 40 44 48

-1.5

-1.4

-1.3

-1.2

-1.1

-1.0

-0.9

-0.8

QC

NW

SplinesQR

Skill Score

Nominal coverage rate [%]

Inte

rva

ls m

ea

n le

ng

th [

% o

f ra

ted

po

we

r]

10 20 30 40 50 60 70 80 90

02

04

06

08

0

QC

NW

SplinesQR

Sharpness


32

Evaluation Resuls – Time-adaptive Version


De

via

tio

n [

%]

1 10 20 30 40 50 60 70 80 90 98

-8-6

-4-2

02

46

Offline

Time-adaptive (n=2738)



Time-adaptive Nadaraya-Watson estimator


De

via

tio

n [

%]

1 10 20 30 40 50 60 70 80 90 98

-10

-50

5

Offline




Quantile-copula Nadaraya-Watson estimator


33

Evaluation Resuls – Time-adaptive Version

Nominal coverage rate [%]

Inte

rva

ls m

ea

n le

ng

th [

% o

f ra

ted

po

we

r]

10 20 30 40 50 60 70 80 90

01

02

03

04

05

06

0

Offline




Look-ahead Time [hours]

Skill

Sco

re

6 9 12 16 20 24 28 32 36 40 44 48-1

.3-1

.2-1

.1-1

.0-0

.9-0

.8

Offline




Skill Score Sharpness


34

Kernel Density Forecast: Remarks

• The time-adaptive QC copes with evolving data and non Gaussian data

• Provides probabilistic forecasts that can be used in several decision-making problems

• Quantile-copula have a tendency to present a better performance in terms of calibration

• the quantile regression methods have a tendency to present a better performance in terms of sharpness

• the skill score of quantile-copula and quantile regression is rather similar

• the time-adaptive approach changes the bias of the probabilistic forecasts (calibration), while changing slightly the sharpness and resolution


35

Application: Setting the Power System Operating Reserve

• The operation of Electric Energy Systems must take into account possible unbalances between generation and load

– Failures of the generating units

– Load forecast errors

– Wind, Solar, etc, forecast errors

• This leads to the need of an Operational Reserve able to respond to possible problems

– Generating units that can take load immediately or in a short period

• The TSO (Transmission System Operator) must define this reserve for the next hours

• The recent increase in wind power penetration turned this exercise more difficult, due to the forecasting uncertainty

– in a recent past, TSO defined the reserve based on empirical rules (such as, the largest unit + 2% of the forecasted load)


36

Application: Setting the Power System Operating Reserve

W: Uncertain Wind Power Generation

L: Uncertain Load

Preferred Reserve Level

Probabilistic Model

C: Uncertain Conventional Generation

Risk/Reserve Curve System Generation

Margin Model M=(C+W)-L

Reserve Cost

Risk/Reserve Cost Curve

Decision

Aid

Decision Maker Preferences

Tool developed in the EU Project Anemos.plus and demonstrated during 6 months for a end-user


M.A. Matos, RJ. Bessa,“Setting the operating reserve using probabilistic wind power forecasts,” IEEE Transactions on Power Systems, vol. 26, no. 2, pp. 594-603, May 2011.

37

Avenues for Future Research

• Method/rule for computing the “optimal” kernel bandwidth for the wind power problem

• Simultaneous density forecast (includes the temporal correlation of forecast errors) • the alternative is to have time trajectories (i.e. random vectors)

• Extreme events forecasting (e.g. extreme wind speed → disconnection of several wind farms)

• Evaluation of probabilistic forecasts (utility-based evaluation)


Wind Power Forecasting Algorithms and Application

Statistics Seminar – Toulouse School of Economics

Ricardo Bessa ([email protected]) 2011 DEC,13

wind power forecasting algorithms and application · 2019. 11. 30. · kernel density estimation...

Documents