wind power forecasting algorithms and application · 2019. 11. 30. · kernel density estimation...
TRANSCRIPT
Wind Power Forecasting Algorithms and Application
Statistics Seminar – Toulouse School of Economics
Ricardo Bessa ([email protected]) 2011 DEC,13
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics 2
Talk Overview
• Introduction to the wind power forecasting problem
• Information Theoretic Learning for Wind Power Point Forecast
• Time-adaptive Quantile-Copula for Wind Power Uncertainty Forecast
• Application: Setting the operating reserve (very brief presentation)
• Avenues for future research
3
Introduction
• Wind power forecasting is vital for two groups of end-users
– system operators → managing the power system
– wind power producers → bidding in the electricity market
• Synergy between three research areas
– Meteorology (physical): numerical weather predictions
– Statistics (mathematical): time series and data mining models
– Power systems (forecast value): decision-making problems with forecasts as input
• It is a business
– several companies sell wind power forecasts as a service
• Prewind Lda, spin-off company from three institutes in Portugal
– other companies sell forecasting systems
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
4
Exhibit A: Nonlinear Conversion of Wind Speed to Power
the non-linear part amplifies the wind speed
forecast error
Power Curve
wind speed error dist. wind power dist.
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
Power Curve
5
Exhibit B: Noisy Data
Power curve with forecasted wind speed
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
6
Exhibit C: Evolving Structure of Data
• Continuous stream of measured data from the wind farms
• Changes in the data (or concept drift)
– limitation of the maximum produced energy
– new wind farm in the same region
– maintenance of wind farms
– loss in performance
– changes in the wind speed prediction model
– etc…
• Time-adaptive models are needed!
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
7
Statistical and Physical Forecasting Models
Physical Statistical
Hybrid: Physical + Statistical
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
8
Very Short-term Wind Power Forecasting (~6 hrs ahead)
Wind Speed Forecasting Wind Power Forecasting
Kalman Filter Adaptive Linear Models
Grey Predictor Fuzzy Time Series
Takagi-Sugeno Self-exciting Threshold Autoregressive
Discrete Hilbert Transform Smooth Transition Autoregressive
Abductive Networks (GMDH) Markov-switching Autoregressive
Adaptive Fuzzy Logic Models
Adaptive Linear Models
ARIMA time series models
Neural Networks Adaptive Neural Fuzzy Inference System
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
SCADA: supervisory control and data
acquisition
past values (t-1,t-2,…)
9
Short-term Wind Power Forecasting (~72hrs ahead)
Methods
Neural Networks Support Vector Machines
Regression Trees with Bagging Random Forests
Adaptive Neural Fuzzy System
Mixture of Experts Nearest Neighbor Search
Autoregressive with Exogenous input (ARX)
Locally Recurrent Neural Networks
Local Polynomial Regression Takagi-Sugeno FIS
Fuzzy Neural Networks Bayesian Clustering by Dynamics (BCD)
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
NWP: Numerical Weather Predictions (e.g. Meteo France)
10
Model Chain for Forecasting
NWP Point Forecasts
Probabilistic Model
Probabilistic Forecasts
NWP Ensemble
Probabilistic Model
Probabilistic Forecasts
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
R.J. Bessa, C. Monteiro, V. Miranda, A. Botterud, J. Wang, and G. Conzelmann, “Wind power forecasting: state-of-the-art 2009,” Report ANL/DIS-10-1, Argonne National Laboratory, 2009. http://www.dis.anl.gov/projects/windpowerforecasting.html
NWP Point Forecasts
Wind Power Point Forecast Model
Probabilistic Model
Probabilistic Forecasts
11
Information Theoretic Learning for Wind Power Forecast
NWP Point Forecasts
Wind Power Point Forecast Model
Probabilistic Model
Probabilistic Forecasts
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
R.J. Bessa, V. Miranda, and J. Gama, “Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting,” IEEE Transactions on Power Systems, vol. 24, no. 4, pp. 1657-1666, Nov. 2009
R.J. Bessa, V. Miranda, A. Botterud, and J. Wang, “‘Good’ or ‘bad’ wind power forecasts: a relative concept,” Wind Energy, vol. 14, pp. 625-636, Jul. 2011
12
Information Theoretic Learning for Wind Power Forecast
• Wind power forecast errors are non-Gaussian
– Mean Square Error (MSE) is only optimal under Gaussian distribution
• The Information Theoretic Learning (ITL) idea…
– The ideal case is when the pdf of errors ε is a Dirac function - all errors equal (of the same value)
– with all errors equal, one could have perfect matching between output O and target T, by adding a bias term
e
d (e)
errors
-
= d 1 ) e ( ε=T-O
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
13
Correntropy as a Cost Function
• Correntropy is a generalized similarity measure between Target (T) and Output (O)
• MCC (Maximum Correntropy Criterion) → max Jσ → maximizing the pdf of errors at the origin
=
-=N
i
ii otGN
OTJ1
,1
,
O
T
z
can be interpreted as the probability that T = O in a joint
space defined by σ p(T=O)
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
G: Gaussian kernel
14
Correntropy Induced Metric (CIM)
• Contours of CIM (X,0) - distance to the origin in a 2d space (for σ = 1)
• Small errors: Euclidian (behaves like MSE)
• Medium size errors: Manhattan Large errors: indifference
-8.0 -6
.0 -4.0 -2
.0 0.0 2
.0 4.0 6
.0 8.0
-8,0
-6,8
-5,6
-4,4
-3,2
-2,0
-0,8
0,4
1,6
2,8
4,0
5,2
6,4
7,6
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
15
Results for a Real Wind Farm with a Neural Network
0.0000
0.0001
0.0002
0.0003
0.0004
0.0005
-1 -0.5 0 0.5 1
Normalized Error [% of rated power]
MSE
MCC
10
12
14
16
18
20
22
NM
AE
[% o
f ra
ted
po
we
r]
MSE
MCC higher frequency of errors close to zero
Not Gaussian
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
16
ILT based Training: Remarks
• ITL criteria (correntropy in particular) presented better results than MSE, in terms of
– higher frequency of errors close to zero
– insensitivity to outliers
– best fit of the predictions to the actual values verified
• ITL criteria cannot be ignored when building robust wind power prediction models
• The ITL criteria were used in neural networks, however it can be applied in other statistical and machine learning methods
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
17
Wind Power Uncertainty Forecast
NWP Point Forecasts
Probabilistic Model
Probabilistic Forecasts
NWP Point Forecasts
Wind Power Point Forecast Model
Probabilistic Model
Probabilistic Forecasts
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
R.J. Bessa, V. Miranda, A. Botterud, J. Wang and Z. Zhou, “Time-adaptive quantile-copula for wind power probabilistic forecasting,” Renewable Energy, vol. 40, pp. 29-39, April 2012.
18
Why Kernel Density Forecast?
• The information provided by point forecasts is unsatisfactory for some decision-making problems
• Modeling of wind power uncertainty is “distribution free”
• Uncertainty characterized by the full distribution
• Fast method for uncertainty modeling
• High flexibility to represent uncertainty (e.g. density func., quantiles, mass func.)
• Captures multimodal pdf
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
19
Kernel Density Estimation (KDE)
𝑓 𝑦|𝑋 = 𝑥 =𝑓𝑋𝑌 𝑥, 𝑦
𝑓𝑋 𝑥
Conditional KDE: estimating the density of a r.v. Y, knowing that the explanatory r. v.
X is equal to x Joint or multivariate density function
of X and Y
Marginal density of X
𝑓 𝑋 𝑥 =1
𝑁∙
1
ℎ𝑖∙ 𝐾
𝑥 − 𝑋𝑖
ℎ
𝑁
𝑖=1
𝐾 𝑥 =1
𝑁∙
1
2𝜋∙ 𝑒
−𝑥−𝑋𝑖
2
2∙ℎ2
𝑁
𝑖=1
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
20
Quantile-Copula Estimator
Copula Definition multivariate distribution function separated in: • marginal functions • dependency structure between the marginals,
modeled by the copula
copula density function
KDE ESTIMATOR
Ui=FXe(Xi) and Vi=FY
e(Yi)
𝐹𝑋𝑌 𝑥, 𝑦 = 𝐶 𝐹𝑋 𝑥 , 𝐹𝑌 𝑦
𝑓 𝑥, 𝑦 =𝜕2
𝜕𝑢 ∙ 𝜕𝑣∙ 𝐶 𝑢, 𝑣 = 𝑓𝑋 𝑥 ∙ 𝑓𝑌 𝑦 ∙ 𝑐 𝑢, 𝑣
𝑓 𝑦|𝑋 = 𝑥 =𝑓𝑋 𝑥 ∙𝑓𝑌 𝑦 ∙𝑐 𝑢,𝑣
𝑓𝑋 𝑥= 𝑓𝑌 𝑦 ∙ 𝑐 𝑢, 𝑣
𝑓 𝑌 𝑦 =1
𝑁∙
1
ℎ𝑖∙ 𝐾
𝑦 − 𝑌
ℎ𝑦
𝑁
𝑖=1
𝑐 𝑢, 𝑣 =1
𝑁 𝐾
𝑢 − 𝑈𝑖
ℎ𝑢∙ 𝐾
𝑣 − 𝑉𝑖
ℎ𝑣
𝑁
𝑖=1
KDE ESTIMATOR
Fe: empirical cum. dist.
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
21
Quantile-Copula Estimator
• Advantages
– methods based on the Nadaraya-Watson are numerically unstable when the denominator is close to zero
– for a problem with several explanatory variables, this method has only one kernel product, instead of two
– purely based on density estimation methods
– gives the full pdf for the wind power uncertainty
𝑓 𝑦|𝑋 = 𝑥 =1
𝑁 ∙ ℎ𝑦∙ 𝐾𝑦
𝑦 − 𝑌𝑖
ℎ𝑦
𝑁
𝑖=1
∙1
𝑁∙ 𝐾𝑢
𝐹𝑋𝑒 𝑢 − 𝐹𝑋
𝑒 𝑈𝑖
ℎ𝑢
𝑁
𝑖=1
∙ 𝐾𝑣
𝐹𝑋𝑒 𝑣 − 𝐹𝑋
𝑒 𝑉𝑖
ℎ𝑣
O.P. Faugeras, “A quantile-copula approach to conditional density estimation,” Journal of Multivariate Analysis, vol. 40, no. 1, pp. 2083-2099, Oct. 2009.
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
22
Time-adaptive Estimator
𝑓 𝑛 𝑥 =𝑛 − 1
𝑛∙ 𝑓 𝑛−1 𝑥 +
1
𝑛 ∙ ℎ𝑖∙ 𝐾
𝑥 − 𝑋𝑖
ℎ𝑖
𝑓 𝑛 𝑥 = 𝜆 ∙ 𝑓 𝑛−1 𝑥 +1 − 𝜆
ℎ𝑖∙ 𝐾
𝑥 − 𝑋𝑖
ℎ𝑖
forgetting factor 𝜆 =
𝑛
𝑛 + 1
Recursive Estimator
Exponential Smoothing
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
for stationary data streams, and 1/n aprox. 0 when 𝑛 → ∞
for nonstationary data streams
23
Time-adaptive Quantile-Copula Estimator
𝐹𝑒 𝑥 𝑡 = 𝜆 ∙ 𝐹𝑒 𝑥 𝑡−1 + 1 − 𝜆 ∙ Ι 𝑥𝑖 ≤ 𝑥
time-adaptive empirical cumulative distribution function
𝑓 𝑦|𝑥 = 𝑋 𝑡 = 𝑓 𝑡 𝑦 ∙ 𝑐 𝑡 𝑢, 𝑣
𝑓 𝑡 𝑦 = 𝜆 ∙ 𝑓 𝑡−1 𝑦 + 1 − 𝜆 ∙ 𝐾ℎ
𝑦 − 𝑌𝑖
ℎ
𝑐 𝑡 𝑢, 𝑣 = 𝜆 ∙ 𝑐 𝑡−1 𝑢, 𝑣 + 1 − 𝜆 ∙ 𝐾1
𝐹𝑋𝑒 𝑢 − 𝐹𝑋
𝑒 𝑈𝑖
ℎ𝑞∙ 𝐾2
𝐹𝑋𝑒 𝑣 − 𝐹𝑋
𝑒 𝑉𝑖
ℎ𝑞
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
time-adaptive conditional KDE
24
Formulation of the Wind Power Forecast Problem
Forecast the wind power pdf at time step t for each look-ahead time step t+k of a given
time-horizon knowing a set of explanatory variables (NWP forecasts, wind power
measured values, hour of the day)
0
0.2
0.4
0.6
0.8
1
0
5
10
15
20
Win
d S
pee
d (m
/s)
Wind Power (p.u.)
𝑓 𝑃 𝑝𝑡+𝑘|𝑋 = 𝑥𝑡+𝑘|𝑡 =𝑓𝑃,𝑋 𝑝𝑡+𝑘, 𝑥𝑡+𝑘|𝑡
𝑓𝑋 𝑥𝑡+𝑘|𝑡
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
25
Quantile-Copula Estimator vs Classical Multivariate KDE
u=Fv(V) [wind speed]
0.0
0.2
0.4
0.6
0.8
1.0
z=Fp(P
) [P
ow
er]
0.0
0.2
0.4
0.6
0.8
1.0
Cop
ula
De
nsity
: c(u
,z)
1
2
3
4
5
Wind Speed (m/s)
0
5
10
15
20
Pow
er (p
.u.)
0.0
0.2
0.4
0.6
0.8
1.0
Join
t De
nsity
Fun
ction 0.0
0.1
0.2
0.3
0.4
0.5
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
26
Kernel Choice
• Depends on the variable data and type
• In this problem we have two different types – Variables with range [0,1]: wind power and quantiles transforms u and v
• beta kernels
• circular variables: hour of the day and the wind direction – Von Mises distribution
Wind Power [p.u.]
De
nsity
0.00 0.15 0.30 0.45 0.60 0.75 0.90
05
10
15
20
25
30
Wind Power [p.u.]
De
nsity
-0.1 0.1 0.3 0.5 0.7 0.9 1.1
05
10
15
20
Beta Kernel Gaussian Kernel, for var. with ]-∞,+∞[ density.circular(x = data, bw = 1, kernel = "vonmises")
N = 5000 Bandwidth = 1
De
nsi
ty c
ircu
lar
0
90
180
270
+
k=0.1
k=1
k=5
k=30
k=60
Von Mises dist.
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
27
Macro-beta Kernel
• The integrals computed from the beta kernels may
– lead to distributions that do not have an integral equal to 1
– the kernel is also inconsistent for distributions that are point mass at 0% and 100%
• This is due to lack of normalization, and the idea is a modified beta kernel estimator (named “macro-beta”)
𝑓 ′ 𝑥 =𝑓 𝑥
𝑓 𝑥 𝑑𝑥1
0
normalization is employed over the conditional function
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
28
Case-Study and Evaluation Metrics
• NREL dataset: wind power generation of 15 sites
• U.S. Midwest wind farm – 42 hrs ahead forecasts
– NWP produced by a GFS + WRF model chain
• Benchmark algorithms – Splines quantile regression (or additive quantile regression)
– Nadaraya-Watson KDF estimator
• Evaluation metrics – Calibration: deviation between forecasted and estimated probabilities, e.g.
the 85% quantile should contain 85% of obs. values lower or equal to its value • Primary requirement measure by the deviation to perfect calibration
– Sharpnes: mean size of the forecast intervals (target: narrow intervals)
– Skill score: combined information about the predictor’s performance in a single measure (less negative the better, 0 for perfect probabilistic forecasts)
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
29
Proof of Concept – Data with Artificial Change
NREL dataset: test dataset Oct-Dec, connection of 2 sites after Oct
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
Nominal proportion rate [%]
De
via
tio
n [
%]
5 15 25 35 45 55 65 75 85 95
-10
-50
51
0
Offline
Lambda=0.999
Lambda=0.995
Lambda=0.99
quantiles underestimation
quantiles overestimation
30
Evaluation Resuls – Offline Version
Nominal proportion rate [%]
De
via
tion
[%
]
5 15 25 35 45 55 65 75 85 95
-20
24
68
QC
NW
SplinesQR
Calibration
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
31
Evaluation Resuls – Offline Version
Look-ahead Time [hours]S
kill
Sco
re6 9 12 16 20 24 28 32 36 40 44 48
-1.5
-1.4
-1.3
-1.2
-1.1
-1.0
-0.9
-0.8
QC
NW
SplinesQR
Skill Score
Nominal coverage rate [%]
Inte
rva
ls m
ea
n le
ng
th [
% o
f ra
ted
po
we
r]
10 20 30 40 50 60 70 80 90
02
04
06
08
0
QC
NW
SplinesQR
Sharpness
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
32
Evaluation Resuls – Time-adaptive Version
Nominal proportion rate [%]
De
via
tio
n [
%]
1 10 20 30 40 50 60 70 80 90 98
-8-6
-4-2
02
46
Offline
Time-adaptive (n=2738)
Time-adaptive (n=1000)
Time-adaptive (n=200)
Time-adaptive Nadaraya-Watson estimator
Nominal proportion rate [%]
De
via
tio
n [
%]
1 10 20 30 40 50 60 70 80 90 98
-10
-50
5
Offline
Time-adaptive (n=2738)
Time-adaptive (n=1000)
Time-adaptive (n=200)
Quantile-copula Nadaraya-Watson estimator
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
33
Evaluation Resuls – Time-adaptive Version
Nominal coverage rate [%]
Inte
rva
ls m
ea
n le
ng
th [
% o
f ra
ted
po
we
r]
10 20 30 40 50 60 70 80 90
01
02
03
04
05
06
0
Offline
Time-adaptive (n=2738)
Time-adaptive (n=1000)
Time-adaptive (n=200)
Look-ahead Time [hours]
Skill
Sco
re
6 9 12 16 20 24 28 32 36 40 44 48-1
.3-1
.2-1
.1-1
.0-0
.9-0
.8
Offline
Time-adaptive (n=2738)
Time-adaptive (n=1000)
Time-adaptive (n=200)
Skill Score Sharpness
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
34
Kernel Density Forecast: Remarks
• The time-adaptive QC copes with evolving data and non Gaussian data
• Provides probabilistic forecasts that can be used in several decision-making problems
• Quantile-copula have a tendency to present a better performance in terms of calibration
• the quantile regression methods have a tendency to present a better performance in terms of sharpness
• the skill score of quantile-copula and quantile regression is rather similar
• the time-adaptive approach changes the bias of the probabilistic forecasts (calibration), while changing slightly the sharpness and resolution
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
35
Application: Setting the Power System Operating Reserve
• The operation of Electric Energy Systems must take into account possible unbalances between generation and load
– Failures of the generating units
– Load forecast errors
– Wind, Solar, etc, forecast errors
• This leads to the need of an Operational Reserve able to respond to possible problems
– Generating units that can take load immediately or in a short period
• The TSO (Transmission System Operator) must define this reserve for the next hours
• The recent increase in wind power penetration turned this exercise more difficult, due to the forecasting uncertainty
– in a recent past, TSO defined the reserve based on empirical rules (such as, the largest unit + 2% of the forecasted load)
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
36
Application: Setting the Power System Operating Reserve
W: Uncertain Wind Power Generation
L: Uncertain Load
Preferred Reserve Level
Probabilistic Model
C: Uncertain Conventional Generation
Risk/Reserve Curve System Generation
Margin Model M=(C+W)-L
Reserve Cost
Risk/Reserve Cost Curve
Decision
Aid
Decision Maker Preferences
Tool developed in the EU Project Anemos.plus and demonstrated during 6 months for a end-user
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
M.A. Matos, RJ. Bessa,“Setting the operating reserve using probabilistic wind power forecasts,” IEEE Transactions on Power Systems, vol. 26, no. 2, pp. 594-603, May 2011.
37
Avenues for Future Research
• Method/rule for computing the “optimal” kernel bandwidth for the wind power problem
• Simultaneous density forecast (includes the temporal correlation of forecast errors) • the alternative is to have time trajectories (i.e. random vectors)
• Extreme events forecasting (e.g. extreme wind speed → disconnection of several wind farms)
• Evaluation of probabilistic forecasts (utility-based evaluation)
Statistics Seminar | 13 Dec. 2011 | Toulouse School of Economics
Wind Power Forecasting Algorithms and Application
Statistics Seminar – Toulouse School of Economics
Ricardo Bessa ([email protected]) 2011 DEC,13