sta 6857---model selection & exploratory data analysis (§2 ... · arthur berg sta 6857—model...
TRANSCRIPT
STA 6857—Model Selection & Exploratory Data Analysis(§2.2 cont., 2.3)
Model Selection Removing the Trend Homework 2c
Outline
1 Model Selection
2 Removing the Trend
3 Homework 2c
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 2/ 27
Model Selection Removing the Trend Homework 2c
Outline
1 Model Selection
2 Removing the Trend
3 Homework 2c
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 3/ 27
Model Selection Removing the Trend Homework 2c
Model Selection with the F-Statistic
Forward Step — Start with a minimal model. Increase the model by addingin the variable which produces the largest F-statistic. Repeat until the noadditional variable added produces a significant F-statistic.Backward Step — Start with a full model. Decrease the model by removingthe variable which gives the smallest contribution. Repeat until allremaining variables are significant.Forward/Backward Step — Increase the model by adding in the variablewhich produces the largest F-statistic. Decrease the model by removingweakest variable as long as its contribution is not significant. Repeat untilthe no additional variables added produces a significant F-statistic and allvariables in the model are significant.
ExampleMinimal Model: xt = β1 + wt
Complete Model: xt = β1 + β2t + β3t2 + β4 sin t + wt
Use the above procedures to find a model in between the minimal and completemodels like:xt = β1 + β4 sin t + wt.
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 4/ 27
Model Selection Removing the Trend Homework 2c
Example Using Backward Step with the F-Statistic in R
You may wish to view my slides on Linear Models with R (click).> x<- seq(0, 10, .1)> y <- 1 + x + cos(x) + rnorm(x)> yy <- 1 + x + cos(x)> plot(x, y, lwd = 3)> lines(x, yy, lwd = 3)
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 5/ 27
Model Selection Removing the Trend Homework 2c
Example in R (cont.)> f1 <- lm(y ~ x + I(x^2) + I(x^3) + sin(x) + cos(x) + exp(x))> anova(f1)
Analysis of Variance Table
Response: yDf Sum Sq Mean Sq F value Pr(>F)
x 1 692.94 692.94 515.5497 < 2.2e-16 ***I(x^2) 1 0.03 0.03 0.0214 0.8841297I(x^3) 1 19.69 19.69 14.6468 0.0002335 ***sin(x) 1 1.42 1.42 1.0552 0.3069485cos(x) 1 11.01 11.01 8.1934 0.0051832 **exp(x) 1 0.03 0.03 0.0192 0.8900193Residuals 94 126.34 1.34---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
> f2 <- lm(y ~ x + I(x^2) + I(x^3) + sin(x) + cos(x))> anova(f2)
Analysis of Variance Table
Response: yDf Sum Sq Mean Sq F value Pr(>F)
x 1 692.94 692.94 520.9277 < 2.2e-16 ***I(x^2) 1 0.03 0.03 0.0216 0.8835280I(x^3) 1 19.69 19.69 14.7996 0.0002164 ***sin(x) 1 1.42 1.42 1.0662 0.3044251cos(x) 1 11.01 11.01 8.2789 0.0049533 **Residuals 95 126.37 1.33---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 6/ 27
Model Selection Removing the Trend Homework 2c
Example in R (cont.)
> f3 <- lm(y ~ x + I(x^3) + sin(x) + cos(x))> anova(f3)
Analysis of Variance Table
Response: yDf Sum Sq Mean Sq F value Pr(>F)
x 1 692.94 692.94 526.2840 < 2.2e-16 ***I(x^3) 1 0.34 0.34 0.2549 0.6148sin(x) 1 0.16 0.16 0.1231 0.7265cos(x) 1 31.62 31.62 24.0135 3.88e-06 ***Residuals 96 126.40 1.32---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
> f4 <- lm(y ~ x + I(x^3) + cos(x))> anova(f4)
Analysis of Variance Table
Response: yDf Sum Sq Mean Sq F value Pr(>F)
x 1 692.94 692.94 531.1043 < 2.2e-16 ***I(x^3) 1 0.34 0.34 0.2573 0.6132cos(x) 1 31.62 31.62 24.2370 3.492e-06 ***Residuals 97 126.56 1.30---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 7/ 27
Model Selection Removing the Trend Homework 2c
Example in R (cont.)
> f5 <- lm(y ~ x + cos(x))> anova(f5)
Analysis of Variance Table
Response: yDf Sum Sq Mean Sq F value Pr(>F)
x 1 692.94 692.94 533.564 < 2.2e-16 ***cos(x) 1 31.24 31.24 24.057 3.717e-06 ***Residuals 98 127.27 1.30---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
> lines(x,predict(f5), lwd=3, col="magenta")
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 8/ 27
Model Selection Removing the Trend Homework 2c
Estimation of σ2
The unbiased estimate of the variance of the WN, σ2, based on a model with kcoefficients is given by
s2k =
RSSk
n− kThe maximum likelihood estimate of the variance with k coefficients is given by
σ2k =
RSSk
n
From classical estimation of the variance
Unbiased ML Besta
1n−1
∑ni=1(xi − x)2 1
n
∑ni=1(xi − x)2 1
n+1∑n
i=1(xi − x)2
ain terms of MSE and under normality
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 9/ 27
Model Selection Removing the Trend Homework 2c
Akaike’s Information Criterion (AIC)
General model selection idea:
Criterion = Model Error︸ ︷︷ ︸think of RSS—make as small as possible
+ Penalty for Model Complexity︸ ︷︷ ︸penalty increases with the number of parameters
Definition (AIC)The AIC for k parameters is defined as
AIC = ln σ2k +
n + 2kn
Definition (AICc — AIC for small sample sizes (small n))The biased corrected AIC (AICc) for k parameters is defined as
AIC = ln σ2k +
n + kn− k − 2
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 10/ 27
Model Selection Removing the Trend Homework 2c
Schwarz’s Information Criterion (SIC) aka BayesianInformation Criterion (BIC)
Definition (SIC/BIC)The SIC for k parameters is defined as
SIC = ln σ2k +
k ln nn
The penalty term in the AIC is2kn
The penalty term in the SIC isk ln n
n
The SIC penalizes larger models more severely than the AIC.The SIC is proven to be a statistically consistent model selection methodwhereas the AIC is not.
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 11/ 27
Model Selection Removing the Trend Homework 2c
Mallows Cp
Cp statistic is defined as
Cp =RSSp
σ2q− n + 2p
Pick model with smallest Cp with the property that Cp ≈ p.
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 12/ 27
Model Selection Removing the Trend Homework 2c
Adjusted R2
R2 statistic is defined as
R2 = 1−RSSp∑(yi − y)2
nn− p
Pick model with largest R2.
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 13/ 27
Model Selection Removing the Trend Homework 2c
Example from Text
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 14/ 27
Model Selection Removing the Trend Homework 2c
Scatterplot Matrix
> pairs(cbind(mort, temp, part))
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 15/ 27
Model Selection Removing the Trend Homework 2c
Four Models
Data from 1970 to 1979 in LA.
Mt represents the cardiac mortality.
Tt represents the temperatures.
Pt represents the particulate pollution.
Four proposed models1 Mt = β0 + β1t + wt
2 Mt = β0 + β1t + β2(Tt − T.) + wt
3 Mt = β0 + β1t + β2(Tt − T.) + β3(Tt − T.)2 + wt
4 Mt = β0 + β1t + β2(Tt − T.) + β3(Tt − T.)2 + β4Pt + wt
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 16/ 27
Model Selection Removing the Trend Homework 2c
Example from Text
> temp = temp-mean(temp)> temp2 = temp^2> trend = time(mort)> fit = lm(mort~ trend + temp + temp2 + part, na.action=NULL)> summary(fit)
Call:lm(formula = mort ~ trend + temp + temp2 + part, na.action = NULL)
Residuals:Min 1Q Median 3Q Max
-19.0760 -4.2153 -0.4878 3.7435 29.2448
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.831e+03 1.996e+02 14.19 < 2e-16 ***trend -1.396e+00 1.010e-01 -13.82 < 2e-16 ***temp -4.725e-01 3.162e-02 -14.94 < 2e-16 ***temp2 2.259e-02 2.827e-03 7.99 9.26e-15 ***part 2.554e-01 1.886e-02 13.54 < 2e-16 ***---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 6.385 on 503 degrees of freedomMultiple R-Squared: 0.5954, Adjusted R-squared: 0.5922F-statistic: 185 on 4 and 503 DF, p-value: < 2.2e-16
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 17/ 27
Model Selection Removing the Trend Homework 2c
Example from Text
> summary(aov(fit))
Df Sum Sq Mean Sq F value Pr(>F)trend 1 10666.9 10666.9 261.621 < 2.2e-16 ***temp 1 8606.6 8606.6 211.090 < 2.2e-16 ***temp2 1 3428.7 3428.7 84.094 < 2.2e-16 ***part 1 7476.1 7476.1 183.362 < 2.2e-16 ***Residuals 503 20508.4 40.8---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 18/ 27
Model Selection Removing the Trend Homework 2c
Example from Text
> num = length(mort)> AIC(fit)/num - log(2*pi) # AIC
[1] 4.721732
> AIC(fit, k=log(num))/num - log(2*pi) # BIC
[1] 4.771699
> log(sum(resid(fit)^2)/num)+(num+5)/(num-5-2) # AICc
[1] 4.722062
Model RSS AICc1 40,020 5.382 31,413 5.143 27,985 5.034 20,509 4.72
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 19/ 27
Model Selection Removing the Trend Homework 2c
Outline
1 Model Selection
2 Removing the Trend
3 Homework 2c
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 20/ 27
Model Selection Removing the Trend Homework 2c
General Principal
If the time series is given asxt = µt + yt
where µt denotes a trend and yt is stationary, then we estimate the trend touncover the underlying stationary process by considering
yt = xt − µt
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 21/ 27
Model Selection Removing the Trend Homework 2c
Detrending Global Temperature
> globtemp = ts(scan("mydata/globtemp.dat"), start=1856)> gtemp=window(globtemp, start=1900)> fit = lm(gtemp~time(gtemp), na.action=NULL)> plot(gtemp, type="o", ylab="Global Temperature Deviation")> abline(fit)
> plot(fit$res)
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 22/ 27
Model Selection Removing the Trend Homework 2c
Global Temperature as a Random Walk
Suppose the mean is stochastic (not fixed as in the regression context). We canmodel the mean as a random walk, i.e.
µt = δ + µt−1 + wt
In this case, differencing xt makes it stationary—
∆xt = xt − xt−1 = (µt + yt)− (µt−1 + yt−1) = δ + wt + yt − yt−1
(You will show this is stationary in your homework.)> plot(diff(gtemp), type="o", main="first difference")
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 23/ 27
Model Selection Removing the Trend Homework 2c
Backshift and Difference Operators
Definition (Backshift Operator)The backshift operator, B, is defined by
Bxt = xt−1
and repeated backshift operations are represented by
Bkxt = BB · · ·B︸ ︷︷ ︸k times
xt = xt−k
Definition (Difference Operator)The difference operator, ∆, is defined by
∆xt = (1− B)xt = xt − xt−1
and repeated d times is called differences of order d which represented by
∆d = (1− B)(1− B) · · · (1− B)︸ ︷︷ ︸d times
xt
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 24/ 27
Model Selection Removing the Trend Homework 2c
Outline
1 Model Selection
2 Removing the Trend
3 Homework 2c
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 25/ 27
Model Selection Removing the Trend Homework 2c
Textbook Reading
Read the following sections from the textbook§2.4 (Smoothing in the Time Series Context)
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 26/ 27
Model Selection Removing the Trend Homework 2c
Textbook Problems
Do the following exercises from the textbook2.6
2.7
Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 27/ 27