etc5410: nonparametric smoothing methodsrobjhyndman.com/etc5410/additive.pdf · etc5410:...
TRANSCRIPT
ETC5410: Nonparametric smoothing methods 1
ETC5410: Nonparametricsmoothing methods
July 2008
Rob J Hyndmanhttp://www.robjhyndman.com/
ETC5410: Nonparametric smoothing methods 2
Outline
1 Density estimation
2 Kernel regression
3 Splines
4 Additive models
5 Functional data analysis
ETC5410: Nonparametric smoothing methods 3
ETC5410: Nonparametricsmoothing methods
4. Additive models
1 Penalized regression splines
2 Mixed model representation
3 Additive models
4 Case study: electricity demand
ETC5410: Nonparametric smoothing methods Penalized regression splines 4
Outline
1 Penalized regression splines
2 Mixed model representation
3 Additive models
4 Case study: electricity demand
ETC5410: Nonparametric smoothing methods Penalized regression splines 5
Penalized spline regression
Recall linear version:
r(x) = b(x)β
where
b(x) =[1 x (x − κ1)+ . . . (x − κK )+
]and β = [β0, β1, u1, . . . , uK ]′ chosen to minimize
‖y − Bβ‖2 subject to β′Dβ ≤ C
with B = [b′(x1), . . . ,b′(xn)]′ and D =
[02×2 02×K
0K×2 IK×K
].
Solution: βλ = (B′B + λ2D)−1B′y.
ETC5410: Nonparametric smoothing methods Penalized regression splines 6
Penalized regression splines
●●●
●●
●●●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
500 550 600 650 700 750 800 850
010
2030
40
price
ship
men
ts
ETC5410: Nonparametric smoothing methods Penalized regression splines 6
Penalized regression splines
500 550 600 650 700 750 800 850
010
2030
40
price
●●●
●●
●●●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
lambda = 10
ETC5410: Nonparametric smoothing methods Penalized regression splines 6
Penalized regression splines
500 550 600 650 700 750 800 850
010
2030
40
price
●●●
●●
●●●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
lambda = 40
ETC5410: Nonparametric smoothing methods Penalized regression splines 6
Penalized regression splines
500 550 600 650 700 750 800 850
010
2030
40
price
●●●
●●
●●●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
lambda = 70
ETC5410: Nonparametric smoothing methods Penalized regression splines 6
Penalized regression splines
500 550 600 650 700 750 800 850
010
2030
40
price
●●●
●●
●●●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
lambda = 100
ETC5410: Nonparametric smoothing methods Penalized regression splines 6
Penalized regression splines
500 550 600 650 700 750 800 850
010
2030
40
price
●●●
●●
●●●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
lambda = 130
ETC5410: Nonparametric smoothing methods Penalized regression splines 6
Penalized regression splines
500 550 600 650 700 750 800 850
010
2030
40
price
●●●
●●
●●●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
lambda = 160
ETC5410: Nonparametric smoothing methods Penalized regression splines 6
Penalized regression splines
500 550 600 650 700 750 800 850
010
2030
40
price
●●●
●●
●●●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
lambda = 190
ETC5410: Nonparametric smoothing methods Penalized regression splines 6
Penalized regression splines
500 550 600 650 700 750 800 850
010
2030
40
price
●●●
●●
●●●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
lambda = 220
ETC5410: Nonparametric smoothing methods Penalized regression splines 6
Penalized regression splines
500 550 600 650 700 750 800 850
010
2030
40
price
●●●
●●
●●●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
lambda = 250
ETC5410: Nonparametric smoothing methods Penalized regression splines 6
Penalized regression splines
500 550 600 650 700 750 800 850
010
2030
40
price
●●●
●●
●●●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
lambda = 280
ETC5410: Nonparametric smoothing methods Mixed model representation 7
Outline
1 Penalized regression splines
2 Mixed model representation
3 Additive models
4 Case study: electricity demand
ETC5410: Nonparametric smoothing methods Mixed model representation 8
Mixed model representation
Split B matrix in two:
X =
1 x1...
...1 xn
and Z =
(x1 − κ1)+ . . . (x1 − κK )+... . . . ...
(xn − κ1)+ . . . (xn − κK )+
and let β = [β0, β1]′ and u = [u1, . . . , uK ]′.Then we want to minimize
‖y − Xβ − Zu‖2 + λ2‖u‖2This is equivalent to finding the Best LinearUnbiased Predictor (BLUP) of the mixed model
y + Xβ + Zu + ε
where ui ∼ N(0, σ2u) and εj ∼ N(0, σ2
ε).
ETC5410: Nonparametric smoothing methods Mixed model representation 9
Mixed model representation
Advantages
Automatic penalty selection: use REML.
Easy to fit using standard software.
Easy to develop Bayesian version
FormulasLet λ = σε/σu and V = Cov(y) = σ2
uZZ′ + σ2εI.
Thenβ = (XV−1X)−1X′V−1y.
u = σ2uZ′V−1(y − Xβ)−1.
V estimated using profile log-likelihood methods.
ETC5410: Nonparametric smoothing methods Mixed model representation 9
Mixed model representation
Advantages
Automatic penalty selection: use REML.
Easy to fit using standard software.
Easy to develop Bayesian version
FormulasLet λ = σε/σu and V = Cov(y) = σ2
uZZ′ + σ2εI.
Thenβ = (XV−1X)−1X′V−1y.
u = σ2uZ′V−1(y − Xβ)−1.
V estimated using profile log-likelihood methods.
ETC5410: Nonparametric smoothing methods Mixed model representation 9
Mixed model representation
Advantages
Automatic penalty selection: use REML.
Easy to fit using standard software.
Easy to develop Bayesian version
FormulasLet λ = σε/σu and V = Cov(y) = σ2
uZZ′ + σ2εI.
Thenβ = (XV−1X)−1X′V−1y.
u = σ2uZ′V−1(y − Xβ)−1.
V estimated using profile log-likelihood methods.
ETC5410: Nonparametric smoothing methods Mixed model representation 9
Mixed model representation
Advantages
Automatic penalty selection: use REML.
Easy to fit using standard software.
Easy to develop Bayesian version
FormulasLet λ = σε/σu and V = Cov(y) = σ2
uZZ′ + σ2εI.
Thenβ = (XV−1X)−1X′V−1y.
u = σ2uZ′V−1(y − Xβ)−1.
V estimated using profile log-likelihood methods.
ETC5410: Nonparametric smoothing methods Mixed model representation 9
Mixed model representation
Advantages
Automatic penalty selection: use REML.
Easy to fit using standard software.
Easy to develop Bayesian version
FormulasLet λ = σε/σu and V = Cov(y) = σ2
uZZ′ + σ2εI.
Thenβ = (XV−1X)−1X′V−1y.
u = σ2uZ′V−1(y − Xβ)−1.
V estimated using profile log-likelihood methods.
ETC5410: Nonparametric smoothing methods Mixed model representation 9
Mixed model representation
Advantages
Automatic penalty selection: use REML.
Easy to fit using standard software.
Easy to develop Bayesian version
FormulasLet λ = σε/σu and V = Cov(y) = σ2
uZZ′ + σ2εI.
Thenβ = (XV−1X)−1X′V−1y.
u = σ2uZ′V−1(y − Xβ)−1.
V estimated using profile log-likelihood methods.
ETC5410: Nonparametric smoothing methods Mixed model representation 10
Choice of knots
Provided the set of knots is relatively densewith respect to the {xi}, the result hardlychanges.
Choose enough knots to model structure, butnot too many knots to cause computationalproblems.RWC recommend:
max(m/4, 35) knots where m = number of uniqueobservations.κj =
(j+1K+1
)th sample quantile of the unique {xi}.
ETC5410: Nonparametric smoothing methods Mixed model representation 10
Choice of knots
Provided the set of knots is relatively densewith respect to the {xi}, the result hardlychanges.
Choose enough knots to model structure, butnot too many knots to cause computationalproblems.
RWC recommend:
max(m/4, 35) knots where m = number of uniqueobservations.κj =
(j+1K+1
)th sample quantile of the unique {xi}.
ETC5410: Nonparametric smoothing methods Mixed model representation 10
Choice of knots
Provided the set of knots is relatively densewith respect to the {xi}, the result hardlychanges.
Choose enough knots to model structure, butnot too many knots to cause computationalproblems.RWC recommend:
max(m/4, 35) knots where m = number of uniqueobservations.κj =
(j+1K+1
)th sample quantile of the unique {xi}.
ETC5410: Nonparametric smoothing methods Mixed model representation 10
Choice of knots
Provided the set of knots is relatively densewith respect to the {xi}, the result hardlychanges.
Choose enough knots to model structure, butnot too many knots to cause computationalproblems.RWC recommend:
max(m/4, 35) knots where m = number of uniqueobservations.
κj =(
j+1K+1
)th sample quantile of the unique {xi}.
ETC5410: Nonparametric smoothing methods Mixed model representation 10
Choice of knots
Provided the set of knots is relatively densewith respect to the {xi}, the result hardlychanges.
Choose enough knots to model structure, butnot too many knots to cause computationalproblems.RWC recommend:
max(m/4, 35) knots where m = number of uniqueobservations.κj =
(j+1K+1
)th sample quantile of the unique {xi}.
ETC5410: Nonparametric smoothing methods Mixed model representation 11
Example
500 550 600 650 700 750 800 850
1015
2025
3035
price
●●●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
ETC5410: Nonparametric smoothing methods Mixed model representation 12
Example
Implementation in R
fit <- spm(shipments ∼ f(price))plot(fit, col=”red”, lwd=2,
shade.col=”yellow”, rug.col=”blue”)points(price, shipments, col=”blue”)
ETC5410: Nonparametric smoothing methods Additive models 13
Outline
1 Penalized regression splines
2 Mixed model representation
3 Additive models
4 Case study: electricity demand
ETC5410: Nonparametric smoothing methods Additive models 14
Additive models
One way around curse of dimensionality is toassume surface is additive:
r(x) = r0 +m∑
i=1
ri(xi).
Restricts complexity of surfaces but still allowsa much richer class of surfaces than parametricmodels.
Need to estimate m one-dimensional functionsinstead of one m-dimensional function.
ETC5410: Nonparametric smoothing methods Additive models 14
Additive models
One way around curse of dimensionality is toassume surface is additive:
r(x) = r0 +m∑
i=1
ri(xi).
Restricts complexity of surfaces but still allowsa much richer class of surfaces than parametricmodels.
Need to estimate m one-dimensional functionsinstead of one m-dimensional function.
ETC5410: Nonparametric smoothing methods Additive models 15
Additive models
Usually have m different bandwidths to selectwhen fitting an additive model.
Generalization of multiple regression model
Y = β0 +m∑
i=1
βixi
which is also additive in its predictors.However, additive model do not assumedependence is linear.
Estimated functions, ri , are analogues ofcoefficients in linear regression.
Interpretation easy with additive structure.
ETC5410: Nonparametric smoothing methods Additive models 15
Additive models
Usually have m different bandwidths to selectwhen fitting an additive model.
Generalization of multiple regression model
Y = β0 +m∑
i=1
βixi
which is also additive in its predictors.However, additive model do not assumedependence is linear.
Estimated functions, ri , are analogues ofcoefficients in linear regression.
Interpretation easy with additive structure.
ETC5410: Nonparametric smoothing methods Additive models 15
Additive models
Usually have m different bandwidths to selectwhen fitting an additive model.
Generalization of multiple regression model
Y = β0 +m∑
i=1
βixi
which is also additive in its predictors.However, additive model do not assumedependence is linear.
Estimated functions, ri , are analogues ofcoefficients in linear regression.
Interpretation easy with additive structure.
ETC5410: Nonparametric smoothing methods Additive models 15
Additive models
Usually have m different bandwidths to selectwhen fitting an additive model.
Generalization of multiple regression model
Y = β0 +m∑
i=1
βixi
which is also additive in its predictors.However, additive model do not assumedependence is linear.
Estimated functions, ri , are analogues ofcoefficients in linear regression.
Interpretation easy with additive structure.
ETC5410: Nonparametric smoothing methods Additive models 16
Body fat predictionsiri Percent body fat using Siri’s equationage Age (yrs)weight Weight (lbs)height Height (inches)adipos Adiposity index = Weight/Height2 (kg/m2)neck Neck circumference (cm)chest Chest circumference (cm)abdom Abdomen circumference (cm) at the umbilicus and level
with the iliac cresthip Hip circumference (cm)thigh Thigh circumference (cm)knee Knee circumference (cm)ankle Ankle circumference (cm)biceps Extended biceps circumference (cm)forearm Forearm circumference (cm)wrist Wrist circumference (cm) distal to the styloid processes
ETC5410: Nonparametric smoothing methods Additive models 17
Body fat prediction
Implementation in R
fat <- fat[fat$height>50 & fat$weight<300,]attach(fat)fit <- spm(siri ∼ f(age) + f(height) + f(weight) +
f(abdom) + f(adipos) + f(neck) +f(chest) + f(hip) + f(thigh))
summary(fit)
ETC5410: Nonparametric smoothing methods Additive models 18
Additive models
Estimate each function using a univariatesmoother.
Any of the functions ri can be fitted linearly byusing the linear regression ‘smoother’ for thatvariable.
Categorical variables are easily incorporated byfitting constant for each level of the variable.
Can allow interactions between two variables byfitting a bivariate surface to the partialresiduals.
ETC5410: Nonparametric smoothing methods Additive models 18
Additive models
Estimate each function using a univariatesmoother.
Any of the functions ri can be fitted linearly byusing the linear regression ‘smoother’ for thatvariable.
Categorical variables are easily incorporated byfitting constant for each level of the variable.
Can allow interactions between two variables byfitting a bivariate surface to the partialresiduals.
ETC5410: Nonparametric smoothing methods Additive models 18
Additive models
Estimate each function using a univariatesmoother.
Any of the functions ri can be fitted linearly byusing the linear regression ‘smoother’ for thatvariable.
Categorical variables are easily incorporated byfitting constant for each level of the variable.
Can allow interactions between two variables byfitting a bivariate surface to the partialresiduals.
ETC5410: Nonparametric smoothing methods Additive models 18
Additive models
Estimate each function using a univariatesmoother.
Any of the functions ri can be fitted linearly byusing the linear regression ‘smoother’ for thatvariable.
Categorical variables are easily incorporated byfitting constant for each level of the variable.
Can allow interactions between two variables byfitting a bivariate surface to the partialresiduals.
ETC5410: Nonparametric smoothing methods Additive models 19
Backfitting algorithm
Estimate additive models by estimating eachfunction using a univariate smoother.
Backfitting algorithm is iterative procedure forfitting additive models.
Consider the conditional expectation
E(Y − r0 −
∑j 6=k
rj(xj)∣∣ xk
)= rk(xk).
True for k = 1, . . . ,m.
Using this result and given estimates of rj forj = 1, . . . ,m, we compute improved estimateof rk as follows.
ETC5410: Nonparametric smoothing methods Additive models 19
Backfitting algorithm
Estimate additive models by estimating eachfunction using a univariate smoother.
Backfitting algorithm is iterative procedure forfitting additive models.
Consider the conditional expectation
E(Y − r0 −
∑j 6=k
rj(xj)∣∣ xk
)= rk(xk).
True for k = 1, . . . ,m.
Using this result and given estimates of rj forj = 1, . . . ,m, we compute improved estimateof rk as follows.
ETC5410: Nonparametric smoothing methods Additive models 19
Backfitting algorithm
Estimate additive models by estimating eachfunction using a univariate smoother.
Backfitting algorithm is iterative procedure forfitting additive models.
Consider the conditional expectation
E(Y − r0 −
∑j 6=k
rj(xj)∣∣ xk
)= rk(xk).
True for k = 1, . . . ,m.
Using this result and given estimates of rj forj = 1, . . . ,m, we compute improved estimateof rk as follows.
ETC5410: Nonparametric smoothing methods Additive models 19
Backfitting algorithm
Estimate additive models by estimating eachfunction using a univariate smoother.
Backfitting algorithm is iterative procedure forfitting additive models.
Consider the conditional expectation
E(Y − r0 −
∑j 6=k
rj(xj)∣∣ xk
)= rk(xk).
True for k = 1, . . . ,m.
Using this result and given estimates of rj forj = 1, . . . ,m, we compute improved estimateof rk as follows.
ETC5410: Nonparametric smoothing methods Additive models 20
Backfitting algorithm
Let ei |k ← yi − r0 −∑
j 6=k rj(xi), i = 1, . . . , n.
Then rk ← smooth{(xi , ei |k)}.
These improved estimates are computed fork = 1, . . . ,m. Then the whole cycle is repeateduntil the individual functions don’t change.
Initialize rj(x) = 0, j = 1, . . . ,m.
Alternatively, initialize rj(x) = βjx where βjx isterm from linear regression.
ETC5410: Nonparametric smoothing methods Additive models 20
Backfitting algorithm
Let ei |k ← yi − r0 −∑
j 6=k rj(xi), i = 1, . . . , n.
Then rk ← smooth{(xi , ei |k)}.
These improved estimates are computed fork = 1, . . . ,m. Then the whole cycle is repeateduntil the individual functions don’t change.
Initialize rj(x) = 0, j = 1, . . . ,m.
Alternatively, initialize rj(x) = βjx where βjx isterm from linear regression.
ETC5410: Nonparametric smoothing methods Additive models 20
Backfitting algorithm
Let ei |k ← yi − r0 −∑
j 6=k rj(xi), i = 1, . . . , n.
Then rk ← smooth{(xi , ei |k)}.
These improved estimates are computed fork = 1, . . . ,m. Then the whole cycle is repeateduntil the individual functions don’t change.
Initialize rj(x) = 0, j = 1, . . . ,m.
Alternatively, initialize rj(x) = βjx where βjx isterm from linear regression.
ETC5410: Nonparametric smoothing methods Additive models 21
Convergence
Convergence of backfitting algorithm not alwaysguaranteed.
Convergence proven in some cases, includingsmoothing splines.
Unproven for locally-weighted regressions.
Seems to work well in practice.
ETC5410: Nonparametric smoothing methods Additive models 21
Convergence
Convergence of backfitting algorithm not alwaysguaranteed.
Convergence proven in some cases, includingsmoothing splines.
Unproven for locally-weighted regressions.
Seems to work well in practice.
ETC5410: Nonparametric smoothing methods Additive models 21
Convergence
Convergence of backfitting algorithm not alwaysguaranteed.
Convergence proven in some cases, includingsmoothing splines.
Unproven for locally-weighted regressions.
Seems to work well in practice.
ETC5410: Nonparametric smoothing methods Additive models 22
Coplots
Graphical representation of multidimensionalsurface.Plot slices of surface by plotting value of surfaceagainst one of explanatory variables while holding allother values fixed.
If slices all show similar shape (apart from ashift up or down), then no interaction betweenvariables. Additive model might be preferable.
If slices show changes in shape, there isinteraction. Additive model is not appropriate.
ETC5410: Nonparametric smoothing methods Additive models 22
Coplots
Graphical representation of multidimensionalsurface.Plot slices of surface by plotting value of surfaceagainst one of explanatory variables while holding allother values fixed.
If slices all show similar shape (apart from ashift up or down), then no interaction betweenvariables. Additive model might be preferable.
If slices show changes in shape, there isinteraction. Additive model is not appropriate.
ETC5410: Nonparametric smoothing methods Additive models 23
Inference for Additive Models
Each fitted function can be written as a linear smootherri = Siy for some n × n matrix Si .
Assuming iid errors, then Cov(r) = SiSTi σ
2 whereσ2 = V(Yj).
Since r(x) is simply linear function of the individualfunctions ri , the additive model is also a linear smoother.
Denote smoothing matrix as S:
r(x) = Sy = r01 +m∑
j=1
Sjy
where 1 = [1, 1, . . . , 1]T . Then S =∑m
i=0 Si where S0 issuch that S0y = r01.
Thus all inference results for linear smoothers may beapplied to additive model.
ETC5410: Nonparametric smoothing methods Additive models 23
Inference for Additive Models
Each fitted function can be written as a linear smootherri = Siy for some n × n matrix Si .
Assuming iid errors, then Cov(r) = SiSTi σ
2 whereσ2 = V(Yj).
Since r(x) is simply linear function of the individualfunctions ri , the additive model is also a linear smoother.
Denote smoothing matrix as S:
r(x) = Sy = r01 +m∑
j=1
Sjy
where 1 = [1, 1, . . . , 1]T . Then S =∑m
i=0 Si where S0 issuch that S0y = r01.
Thus all inference results for linear smoothers may beapplied to additive model.
ETC5410: Nonparametric smoothing methods Additive models 23
Inference for Additive Models
Each fitted function can be written as a linear smootherri = Siy for some n × n matrix Si .
Assuming iid errors, then Cov(r) = SiSTi σ
2 whereσ2 = V(Yj).
Since r(x) is simply linear function of the individualfunctions ri , the additive model is also a linear smoother.
Denote smoothing matrix as S:
r(x) = Sy = r01 +m∑
j=1
Sjy
where 1 = [1, 1, . . . , 1]T . Then S =∑m
i=0 Si where S0 issuch that S0y = r01.
Thus all inference results for linear smoothers may beapplied to additive model.
ETC5410: Nonparametric smoothing methods Additive models 23
Inference for Additive Models
Each fitted function can be written as a linear smootherri = Siy for some n × n matrix Si .
Assuming iid errors, then Cov(r) = SiSTi σ
2 whereσ2 = V(Yj).
Since r(x) is simply linear function of the individualfunctions ri , the additive model is also a linear smoother.
Denote smoothing matrix as S:
r(x) = Sy = r01 +m∑
j=1
Sjy
where 1 = [1, 1, . . . , 1]T . Then S =∑m
i=0 Si where S0 issuch that S0y = r01.
Thus all inference results for linear smoothers may beapplied to additive model.
ETC5410: Nonparametric smoothing methods Additive models 23
Inference for Additive Models
Each fitted function can be written as a linear smootherri = Siy for some n × n matrix Si .
Assuming iid errors, then Cov(r) = SiSTi σ
2 whereσ2 = V(Yj).
Since r(x) is simply linear function of the individualfunctions ri , the additive model is also a linear smoother.
Denote smoothing matrix as S:
r(x) = Sy = r01 +m∑
j=1
Sjy
where 1 = [1, 1, . . . , 1]T . Then S =∑m
i=0 Si where S0 issuch that S0y = r01.
Thus all inference results for linear smoothers may beapplied to additive model.
ETC5410: Nonparametric smoothing methods Additive models 24
Degrees of freedom of additivemodel
Need a measure of df associated with eachpredictor. Let S(i) denote smoothing matrix thatwould be obtained if we omitted xi from thepredictor space. Then df due to xi is defined to be
dfi = tr(2S− SST )− tr(2S(i) − S(i)ST(i)).
Derive approximate F tests for the ith predictor inthis way.
ETC5410: Nonparametric smoothing methods Additive models 25
Generalised additive models
What if Y is non-Gaussian (e.g. binary (0,1)).Want to extend additive models in same was aslinear models have been extended to generalisedlinear models.
A generalised additive model (GAM) is defined byspecifying:
1 distribution of response variable
2 link function: g(µ) = r0 +m∑
j=1
rj(xj) where
µ = E(Y | x1, . . . , xm)
ETC5410: Nonparametric smoothing methods Additive models 25
Generalised additive models
What if Y is non-Gaussian (e.g. binary (0,1)).Want to extend additive models in same was aslinear models have been extended to generalisedlinear models.A generalised additive model (GAM) is defined byspecifying:
1 distribution of response variable
2 link function: g(µ) = r0 +m∑
j=1
rj(xj) where
µ = E(Y | x1, . . . , xm)
ETC5410: Nonparametric smoothing methods Additive models 25
Generalised additive models
What if Y is non-Gaussian (e.g. binary (0,1)).Want to extend additive models in same was aslinear models have been extended to generalisedlinear models.A generalised additive model (GAM) is defined byspecifying:
1 distribution of response variable
2 link function: g(µ) = r0 +m∑
j=1
rj(xj) where
µ = E(Y | x1, . . . , xm)
ETC5410: Nonparametric smoothing methods Additive models 26
Examples:
Y binary and g(µ) = log[µ(1− µ)]. This is alogistic additive model.
Y normal and g(µ) = µ. This is a standardadditive model.
ETC5410: Nonparametric smoothing methods Additive models 26
Examples:
Y binary and g(µ) = log[µ(1− µ)]. This is alogistic additive model.
Y normal and g(µ) = µ. This is a standardadditive model.
ETC5410: Nonparametric smoothing methods Additive models 26
Examples:
Y binary and g(µ) = log[µ(1− µ)]. This is alogistic additive model.
Y normal and g(µ) = µ. This is a standardadditive model.
EstimationHastie and Tibshirani describe method for fittingGAMs using a method known as “local scoring”which is an extension of the Fisher scoringprocedure.
ETC5410: Nonparametric smoothing methods Case study: electricity demand 27
Outline
1 Penalized regression splines
2 Mixed model representation
3 Additive models
4 Case study: electricity demand
Forecasting electricity demand 2
The problem
We want to forecast the peak electricitydemand in a half-hour period in ten yearstime.
We have ten years of half-hourly electricitydata, temperature data and someeconomic and demographic data.
The location is South Australia: home tothe most volatile electricity demand in theworld.
Sounds impossible?
Forecasting electricity demand 2
The problem
We want to forecast the peak electricitydemand in a half-hour period in ten yearstime.
We have ten years of half-hourly electricitydata, temperature data and someeconomic and demographic data.
The location is South Australia: home tothe most volatile electricity demand in theworld.
Sounds impossible?
Forecasting electricity demand 2
The problem
We want to forecast the peak electricitydemand in a half-hour period in ten yearstime.
We have ten years of half-hourly electricitydata, temperature data and someeconomic and demographic data.
The location is South Australia: home tothe most volatile electricity demand in theworld.
Sounds impossible?
Forecasting electricity demand 2
The problem
We want to forecast the peak electricitydemand in a half-hour period in ten yearstime.
We have ten years of half-hourly electricitydata, temperature data and someeconomic and demographic data.
The location is South Australia: home tothe most volatile electricity demand in theworld.
Sounds impossible?
Forecasting electricity demand 2
The problem
We want to forecast the peak electricitydemand in a half-hour period in ten yearstime.
We have ten years of half-hourly electricitydata, temperature data and someeconomic and demographic data.
The location is South Australia: home tothe most volatile electricity demand in theworld.
Sounds impossible?
Forecasting electricity demand 3
Demand data
Forecasting electricity demand 4
Demand dataSouth Australian operational demand (summer 06/07)
SA
dem
and
(GW
)
1.0
1.5
2.0
2.5
Nov 06 Dec 06 Jan 07 Feb 07 Mar 07
Forecasting electricity demand 5
Demand dataSA demand (first 3 weeks of January 2007)
Date in January
SA
dem
and
(GW
)
1.0
1.5
2.0
2.5
1 2 3 4 5 6 7 8 9 10 12 14 16 18 2011 13 15 17 19 21
Forecasting electricity demand 6
Demand boxplots
Forecasting electricity demand 7
Temperature data
Forecasting electricity demand 8
Demand drivers
calendar effects
prevailing weather conditions (and thetiming of those conditionals
climate changes
economic and demographic changes
changing technology
Modelling framework
Semi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.
Forecasting electricity demand 8
Demand drivers
calendar effects
prevailing weather conditions (and thetiming of those conditionals
climate changes
economic and demographic changes
changing technology
Modelling framework
Semi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.
Forecasting electricity demand 8
Demand drivers
calendar effects
prevailing weather conditions (and thetiming of those conditionals
climate changes
economic and demographic changes
changing technology
Modelling framework
Semi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.
Forecasting electricity demand 8
Demand drivers
calendar effects
prevailing weather conditions (and thetiming of those conditionals
climate changes
economic and demographic changes
changing technology
Modelling framework
Semi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.
Forecasting electricity demand 8
Demand drivers
calendar effects
prevailing weather conditions (and thetiming of those conditionals
climate changes
economic and demographic changes
changing technology
Modelling framework
Semi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.
Forecasting electricity demand 8
Demand drivers
calendar effects
prevailing weather conditions (and thetiming of those conditionals
climate changes
economic and demographic changes
changing technology
Modelling framework
Semi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.
Forecasting electricity demand 8
Demand drivers
calendar effects
prevailing weather conditions (and thetiming of those conditionals
climate changes
economic and demographic changes
changing technology
Modelling frameworkSemi-parametric additive models withcorrelated errors.
Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.
Forecasting electricity demand 8
Demand drivers
calendar effects
prevailing weather conditions (and thetiming of those conditionals
climate changes
economic and demographic changes
changing technology
Modelling frameworkSemi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.
Variables selected to provide bestout-of-sample predictions for 2005/06 summer.
Forecasting electricity demand 8
Demand drivers
calendar effects
prevailing weather conditions (and thetiming of those conditionals
climate changes
economic and demographic changes
changing technology
Modelling frameworkSemi-parametric additive models withcorrelated errors.Each half-hour period modelled separately.Variables selected to provide bestout-of-sample predictions for 2005/06 summer.
Forecasting electricity demand 9
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
yt,p denotes demand at time t (measured inhalf-hourly intervals) during period p, p = 1, . . . ,48;
hp(t) models all calendar effects;
fp(w1,t,w2,t) models all temperature effects wherew1,t is a vector of recent temperatures at KentTown and w2,t is a vector of recent temperatures atAdelaide airport;
zj,t is a demographic or economic variable at time t
nt denotes the model error at time t.
Forecasting electricity demand 9
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
yt,p denotes demand at time t (measured inhalf-hourly intervals) during period p, p = 1, . . . ,48;
hp(t) models all calendar effects;
fp(w1,t,w2,t) models all temperature effects wherew1,t is a vector of recent temperatures at KentTown and w2,t is a vector of recent temperatures atAdelaide airport;
zj,t is a demographic or economic variable at time t
nt denotes the model error at time t.
Forecasting electricity demand 9
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
yt,p denotes demand at time t (measured inhalf-hourly intervals) during period p, p = 1, . . . ,48;
hp(t) models all calendar effects;
fp(w1,t,w2,t) models all temperature effects wherew1,t is a vector of recent temperatures at KentTown and w2,t is a vector of recent temperatures atAdelaide airport;
zj,t is a demographic or economic variable at time t
nt denotes the model error at time t.
Forecasting electricity demand 9
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
yt,p denotes demand at time t (measured inhalf-hourly intervals) during period p, p = 1, . . . ,48;
hp(t) models all calendar effects;
fp(w1,t,w2,t) models all temperature effects wherew1,t is a vector of recent temperatures at KentTown and w2,t is a vector of recent temperatures atAdelaide airport;
zj,t is a demographic or economic variable at time t
nt denotes the model error at time t.
Forecasting electricity demand 9
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
yt,p denotes demand at time t (measured inhalf-hourly intervals) during period p, p = 1, . . . ,48;
hp(t) models all calendar effects;
fp(w1,t,w2,t) models all temperature effects wherew1,t is a vector of recent temperatures at KentTown and w2,t is a vector of recent temperatures atAdelaide airport;
zj,t is a demographic or economic variable at time t
nt denotes the model error at time t.
Forecasting electricity demand 10
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
hp(t) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:
hp(t) = `p(t) + αt,p + βt,p + γt,p + δt,p
`p(t) is “time of summer” effect (a regression spline);
αt,p is day of week effect;
βt,p is “holiday” effect;
γt,p New Year’s Eve effect;
δt,p is millennium effect;
Forecasting electricity demand 10
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
hp(t) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:
hp(t) = `p(t) + αt,p + βt,p + γt,p + δt,p
`p(t) is “time of summer” effect (a regression spline);
αt,p is day of week effect;
βt,p is “holiday” effect;
γt,p New Year’s Eve effect;
δt,p is millennium effect;
Forecasting electricity demand 10
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
hp(t) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:
hp(t) = `p(t) + αt,p + βt,p + γt,p + δt,p
`p(t) is “time of summer” effect (a regression spline);
αt,p is day of week effect;
βt,p is “holiday” effect;
γt,p New Year’s Eve effect;
δt,p is millennium effect;
Forecasting electricity demand 10
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
hp(t) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:
hp(t) = `p(t) + αt,p + βt,p + γt,p + δt,p
`p(t) is “time of summer” effect (a regression spline);
αt,p is day of week effect;
βt,p is “holiday” effect;
γt,p New Year’s Eve effect;
δt,p is millennium effect;
Forecasting electricity demand 10
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
hp(t) includes handle annual, weekly and daily seasonalpatterns as well as public holidays:
hp(t) = `p(t) + αt,p + βt,p + γt,p + δt,p
`p(t) is “time of summer” effect (a regression spline);
αt,p is day of week effect;
βt,p is “holiday” effect;
γt,p New Year’s Eve effect;
δt,p is millennium effect;
Forecasting electricity demand 11
Fitted results (3pm)
0 50 100 150
−0.
40.
00.
20.
4
Day of summer
Effe
ct o
n de
man
d
Mon Tue Wed Thu Fri Sat Sun
−0.
40.
00.
20.
4
Day of week
Effe
ct o
n de
man
d
Normal Day before Holiday Day after
−0.
40.
00.
20.
4
Holiday
Effe
ct o
n de
man
d
Time: 3:00 pm
Forecasting electricity demand 12
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
fp(w1,t,w2,t) =6∑
k=0
[fk,p(xt−k) + gk,p(dt−k)
]+ qp(x+
t ) + rp(x−t ) + sp(xt)
+6∑j=1
[Fj,p(xt−48j) + Gj,p(dt−48j)
]xt is ave temp across sites at time t;
dt is the temp difference between sites at time t;x+t is max of xt values in past 24 hours;x−t is min of xt values in past 24 hours;xt is ave temp in past seven days.
Each function is smooth and estimated using regression splines.
Forecasting electricity demand 12
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
fp(w1,t,w2,t) =6∑
k=0
[fk,p(xt−k) + gk,p(dt−k)
]+ qp(x+
t ) + rp(x−t ) + sp(xt)
+6∑j=1
[Fj,p(xt−48j) + Gj,p(dt−48j)
]xt is ave temp across sites at time t;dt is the temp difference between sites at time t;
x+t is max of xt values in past 24 hours;x−t is min of xt values in past 24 hours;xt is ave temp in past seven days.
Each function is smooth and estimated using regression splines.
Forecasting electricity demand 12
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
fp(w1,t,w2,t) =6∑
k=0
[fk,p(xt−k) + gk,p(dt−k)
]+ qp(x+
t ) + rp(x−t ) + sp(xt)
+6∑j=1
[Fj,p(xt−48j) + Gj,p(dt−48j)
]xt is ave temp across sites at time t;dt is the temp difference between sites at time t;x+t is max of xt values in past 24 hours;
x−t is min of xt values in past 24 hours;xt is ave temp in past seven days.
Each function is smooth and estimated using regression splines.
Forecasting electricity demand 12
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
fp(w1,t,w2,t) =6∑
k=0
[fk,p(xt−k) + gk,p(dt−k)
]+ qp(x+
t ) + rp(x−t ) + sp(xt)
+6∑j=1
[Fj,p(xt−48j) + Gj,p(dt−48j)
]xt is ave temp across sites at time t;dt is the temp difference between sites at time t;x+t is max of xt values in past 24 hours;x−t is min of xt values in past 24 hours;
xt is ave temp in past seven days.
Each function is smooth and estimated using regression splines.
Forecasting electricity demand 12
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
fp(w1,t,w2,t) =6∑
k=0
[fk,p(xt−k) + gk,p(dt−k)
]+ qp(x+
t ) + rp(x−t ) + sp(xt)
+6∑j=1
[Fj,p(xt−48j) + Gj,p(dt−48j)
]xt is ave temp across sites at time t;dt is the temp difference between sites at time t;x+t is max of xt values in past 24 hours;x−t is min of xt values in past 24 hours;xt is ave temp in past seven days.
Each function is smooth and estimated using regression splines.
Forecasting electricity demand 12
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
fp(w1,t,w2,t) =6∑
k=0
[fk,p(xt−k) + gk,p(dt−k)
]+ qp(x+
t ) + rp(x−t ) + sp(xt)
+6∑j=1
[Fj,p(xt−48j) + Gj,p(dt−48j)
]xt is ave temp across sites at time t;dt is the temp difference between sites at time t;x+t is max of xt values in past 24 hours;x−t is min of xt values in past 24 hours;xt is ave temp in past seven days.
Each function is smooth and estimated using regression splines.
Forecasting electricity demand 12
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
fp(w1,t,w2,t) =6∑
k=0
[fk,p(xt−k) + gk,p(dt−k)
]+ qp(x+
t ) + rp(x−t ) + sp(xt)
+6∑j=1
[Fj,p(xt−48j) + Gj,p(dt−48j)
]xt is ave temp across sites at time t;dt is the temp difference between sites at time t;x+t is max of xt values in past 24 hours;x−t is min of xt values in past 24 hours;xt is ave temp in past seven days.
Each function is smooth and estimated using regression splines.
Forecasting electricity demand 13
Fitted results (3pm)
15 25 35
−0.
40.
00.
20.
4
TemperatureE
ffect
on
dem
and
15 25 35
−0.
40.
00.
20.
4
Lag 1 temperature
Effe
ct o
n de
man
d
15 25 35
−0.
40.
00.
20.
4
Lag 2 temperature
Effe
ct o
n de
man
d
15 25 35
−0.
40.
00.
20.
4
Lag 3 temperature
Effe
ct o
n de
man
d
15 25 35
−0.
40.
00.
20.
4
Lag 4 temperature
Effe
ct o
n de
man
d
15 25 35
−0.
40.
00.
20.
4
Lag 5 temperatureE
ffect
on
dem
and
15 25 35
−0.
40.
00.
20.
4
Lag 1 day temperature
Effe
ct o
n de
man
d
15 25 35
−0.
40.
00.
20.
4
Lag 2 day temperature
Effe
ct o
n de
man
d
15 25 35
−0.
40.
00.
20.
4
Last week average temp
Effe
ct o
n de
man
d
15 25 35
−0.
40.
00.
20.
4
Previous max temp
Effe
ct o
n de
man
d
5 10 20 30
−0.
40.
00.
20.
4
Previous min tempE
ffect
on
dem
and
−10 −5 0 5
−0.
40.
00.
20.
4
Temperature differential
Effe
ct o
n de
man
d
−5 0 5
−0.
40.
00.
20.
4
Lag 1 temp differential
Effe
ct o
n de
man
d
−10 −5 0 5
−0.
40.
00.
20.
4
Lag 2 temp differential
Effe
ct o
n de
man
d
−10 −5 0 5
−0.
40.
00.
20.
4
Lag 3 temp differential
Effe
ct o
n de
man
d
−8 −4 0 4
−0.
40.
00.
20.
4
Lag 4 temp differential
Effe
ct o
n de
man
d
−8 −4 0 4
−0.
40.
00.
20.
4
Lag 5 temp differential
Effe
ct o
n de
man
d
−10 −5 0 5
−0.
40.
00.
20.
4
Lag 6 temp differential
Effe
ct o
n de
man
d
−10 −5 0 5
−0.
40.
00.
20.
4
Lag 1 day temp differential
Effe
ct o
n de
man
d
Time: 3:00 pm
Forecasting electricity demand 14
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
Other variables described by linearrelationships with coefficients c1, . . . , cJ.
Estimation based on annual data.
Variable Coefficient Std. Error t value P valueIntercept −0.13981 0.04338 −3.222 0.018094Gross State Product 0.01684 0.00108 15.649 0.000004Lag Price −0.04957 0.00727 −6.818 0.000488Cooling Degree Days 0.36300 0.01716 21.157 0.000001
Forecasting electricity demand 14
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
Other variables described by linearrelationships with coefficients c1, . . . , cJ.Estimation based on annual data.
Variable Coefficient Std. Error t value P valueIntercept −0.13981 0.04338 −3.222 0.018094Gross State Product 0.01684 0.00108 15.649 0.000004Lag Price −0.04957 0.00727 −6.818 0.000488Cooling Degree Days 0.36300 0.01716 21.157 0.000001
Forecasting electricity demand 14
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
Other variables described by linearrelationships with coefficients c1, . . . , cJ.Estimation based on annual data.
Variable Coefficient Std. Error t value P valueIntercept −0.13981 0.04338 −3.222 0.018094Gross State Product 0.01684 0.00108 15.649 0.000004Lag Price −0.04957 0.00727 −6.818 0.000488Cooling Degree Days 0.36300 0.01716 21.157 0.000001
Forecasting electricity demand 14
Equations
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
Other variables described by linearrelationships with coefficients c1, . . . , cJ.Estimation based on annual data.
Variable Coefficient Std. Error t value P valueIntercept −0.13981 0.04338 −3.222 0.018094Gross State Product 0.01684 0.00108 15.649 0.000004Lag Price −0.04957 0.00727 −6.818 0.000488Cooling Degree Days 0.36300 0.01716 21.157 0.000001
Forecasting electricity demand 15
Predictions
Year
Ann
ual d
eman
d
1998 2000 2002 2004 2006
1.25
1.30
1.35
1.40
1.45
1.50
1.55
ActualFitted
Forecasting electricity demand 16
Predictions65
7075
8085
9095
R−squared
Time of day
R−
squa
red
(%)
12 midnight 6:00 am 9:00 am 12 noon 3:00 pm 6:00 pm 9:00 pm3:00 am 12 midnight
Forecasting electricity demand 17
PredictionsActual demand
Time
1998 2000 2002 2004 2006
1.0
1.5
2.0
2.5
3.0
Predicted demand
Time
1998 2000 2002 2004 2006
1.0
1.5
2.0
2.5
3.0
Forecasting electricity demand 18
Predictions
1.0 1.5 2.0 2.5 3.0
1.0
1.5
2.0
2.5
3.0
Predicted demand
Act
ual d
eman
d
Forecasting electricity demand 19
Peak demand forecasting
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
Multiple alternative futures created byresampling residuals using a seasonalbootstrap;
generating simulations of futuretemperature patterns based on seasonallybootstrapping past temperatures (withsome adjustment for extremes and climatechange);using assumed values for GSP and Price.
Forecasting electricity demand 19
Peak demand forecasting
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
Multiple alternative futures created byresampling residuals using a seasonalbootstrap;generating simulations of futuretemperature patterns based on seasonallybootstrapping past temperatures (withsome adjustment for extremes and climatechange);
using assumed values for GSP and Price.
Forecasting electricity demand 19
Peak demand forecasting
log(yt,p) = hp(t) + fp(w1,t,w2,t) +
J∑j=1
cjzj,t + nt
Multiple alternative futures created byresampling residuals using a seasonalbootstrap;generating simulations of futuretemperature patterns based on seasonallybootstrapping past temperatures (withsome adjustment for extremes and climatechange);using assumed values for GSP and Price.
Forecasting electricity demand 20
Peak demand distribution
Demand
1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Low
Demand (GW)
Den
sity
1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Base
Demand (GW)
Den
sity
1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
High
Demand (GW)
Den
sity
2007/20082008/20092009/20102010/20112011/20122012/20132013/20142014/20152015/20162016/20172017/2018
3 4 5 6
0.0
0.5
1.0
1.5
Low
Demand (GW)
Den
sity
3 4 5 6
0.0
0.5
1.0
1.5
Base
Demand (GW)
Den
sity
3 4 5 6
0.0
0.5
1.0
1.5
High
Demand (GW)
Den
sity
2007/20082008/20092009/20102010/20112011/20122012/20132013/20142014/20152015/20162016/20172017/2018
Forecasting electricity demand 21
Peak demand distribution
Annual maximum demand
3 4 5 6
0.0
0.5
1.0
1.5
Low
Demand (GW)
Den
sity
3 4 5 6
0.0
0.5
1.0
1.5
Base
Demand (GW)
Den
sity
3 4 5 6
0.0
0.5
1.0
1.5
High
Demand (GW)
Den
sity
2007/20082008/20092009/20102010/20112011/20122012/20132013/20142014/20152015/20162016/20172017/2018
3 4 5 6
0.0
0.5
1.0
1.5
Low
Demand (GW)
Den
sity
3 4 5 6
0.0
0.5
1.0
1.5
Base
Demand (GW)
Den
sity
3 4 5 6
0.0
0.5
1.0
1.5
High
Demand (GW)
Den
sity
2007/20082008/20092009/20102010/20112011/20122012/20132013/20142014/20152015/20162016/20172017/2018
Forecasting electricity demand 22
Peak demand distribution
2000 2005 2010 2015
2.5
3.0
3.5
4.0
Year
Pro
b of
exc
eeda
nce
in o
ne y
ear
90%50%10%2%
●
●
●
●
●
●
●●
● ●