sta 6857---model selection & exploratory data analysis (§2 ... · arthur berg sta 6857—model...

STA 6857—Model Selection & Exploratory Data Analysis(§2.2 cont., 2.3)

http://www.calit2.net/newsroom/article.php?id=1076

Model Selection Removing the Trend Homework 2c

Outline

1 Model Selection

2 Removing the Trend

3 Homework 2c

Arthur Berg STA 6857—Model Selection & Exploratory Data Analysis (§2.2 cont., 2.3) 2/ 27


Outline

1 Model Selection


3 Homework 2c



Model Selection with the F-Statistic

Forward Step — Start with a minimal model. Increase the model by addingin the variable which produces the largest F-statistic. Repeat until the noadditional variable added produces a significant F-statistic.Backward Step — Start with a full model. Decrease the model by removingthe variable which gives the smallest contribution. Repeat until allremaining variables are significant.Forward/Backward Step — Increase the model by adding in the variablewhich produces the largest F-statistic. Decrease the model by removingweakest variable as long as its contribution is not significant. Repeat untilthe no additional variables added produces a significant F-statistic and allvariables in the model are significant.

ExampleMinimal Model: xt = β1 + wt

Complete Model: xt = β1 + β2t + β3t2 + β4 sin t + wt

Use the above procedures to find a model in between the minimal and completemodels like:xt = β1 + β4 sin t + wt.



Example Using Backward Step with the F-Statistic in R

You may wish to view my slides on Linear Models with R (click).> x<- seq(0, 10, .1)> y <- 1 + x + cos(x) + rnorm(x)> yy <- 1 + x + cos(x)> plot(x, y, lwd = 3)> lines(x, yy, lwd = 3)


http://www.stat.ufl.edu/~berg/files/presentations/regression.pdf


Example in R (cont.)> f1 <- lm(y ~ x + I(x^2) + I(x^3) + sin(x) + cos(x) + exp(x))> anova(f1)

Analysis of Variance Table

Response: yDf Sum Sq Mean Sq F value Pr(>F)

x 1 692.94 692.94 515.5497 < 2.2e-16 ***I(x^2) 1 0.03 0.03 0.0214 0.8841297I(x^3) 1 19.69 19.69 14.6468 0.0002335 ***sin(x) 1 1.42 1.42 1.0552 0.3069485cos(x) 1 11.01 11.01 8.1934 0.0051832 **exp(x) 1 0.03 0.03 0.0192 0.8900193Residuals 94 126.34 1.34---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

> f2 <- lm(y ~ x + I(x^2) + I(x^3) + sin(x) + cos(x))> anova(f2)



x 1 692.94 692.94 520.9277 < 2.2e-16 ***I(x^2) 1 0.03 0.03 0.0216 0.8835280I(x^3) 1 19.69 19.69 14.7996 0.0002164 ***sin(x) 1 1.42 1.42 1.0662 0.3044251cos(x) 1 11.01 11.01 8.2789 0.0049533 **Residuals 95 126.37 1.33---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1



Example in R (cont.)

> f3 <- lm(y ~ x + I(x^3) + sin(x) + cos(x))> anova(f3)



x 1 692.94 692.94 526.2840 < 2.2e-16 ***I(x^3) 1 0.34 0.34 0.2549 0.6148sin(x) 1 0.16 0.16 0.1231 0.7265cos(x) 1 31.62 31.62 24.0135 3.88e-06 ***Residuals 96 126.40 1.32---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

> f4 <- lm(y ~ x + I(x^3) + cos(x))> anova(f4)



x 1 692.94 692.94 531.1043 < 2.2e-16 ***I(x^3) 1 0.34 0.34 0.2573 0.6132cos(x) 1 31.62 31.62 24.2370 3.492e-06 ***Residuals 97 126.56 1.30---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1



Example in R (cont.)

> f5 <- lm(y ~ x + cos(x))> anova(f5)



x 1 692.94 692.94 533.564 < 2.2e-16 ***cos(x) 1 31.24 31.24 24.057 3.717e-06 ***Residuals 98 127.27 1.30---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

> lines(x,predict(f5), lwd=3, col="magenta")



Estimation of σ2

The unbiased estimate of the variance of the WN, σ2, based on a model with kcoefficients is given by

s2k =

RSSk

n− kThe maximum likelihood estimate of the variance with k coefficients is given by

σ2k =

RSSk

n

From classical estimation of the variance

Unbiased ML Besta

1n−1

∑ni=1(xi − x)2 1

n

∑ni=1(xi − x)2 1

n+1∑n

i=1(xi − x)2

ain terms of MSE and under normality



Akaike’s Information Criterion (AIC)

General model selection idea:

Criterion = Model Error︸︷︷︸think of RSS—make as small as possible

+ Penalty for Model Complexity︸︷︷︸penalty increases with the number of parameters

Definition (AIC)The AIC for k parameters is defined as

AIC = ln σ2k +

n + 2kn

Definition (AICc — AIC for small sample sizes (small n))The biased corrected AIC (AICc) for k parameters is defined as

AIC = ln σ2k +

n + kn− k − 2



Schwarz’s Information Criterion (SIC) aka BayesianInformation Criterion (BIC)

Definition (SIC/BIC)The SIC for k parameters is defined as

SIC = ln σ2k +

k ln nn

The penalty term in the AIC is2kn

The penalty term in the SIC isk ln n

n

The SIC penalizes larger models more severely than the AIC.The SIC is proven to be a statistically consistent model selection methodwhereas the AIC is not.



Mallows Cp

Cp statistic is defined as

Cp =RSSp

σ2q− n + 2p

Pick model with smallest Cp with the property that Cp ≈ p.



Adjusted R2

R2 statistic is defined as

R2 = 1−RSSp∑(yi − y)2

nn− p

Pick model with largest R2.


http://www.creactive.net/Pages/180.asp


Example from Text



Scatterplot Matrix

> pairs(cbind(mort, temp, part))



Four Models

Data from 1970 to 1979 in LA.

Mt represents the cardiac mortality.

Tt represents the temperatures.

Pt represents the particulate pollution.

Four proposed models1 Mt = β0 + β1t + wt

2 Mt = β0 + β1t + β2(Tt − T.) + wt

3 Mt = β0 + β1t + β2(Tt − T.) + β3(Tt − T.)2 + wt

4 Mt = β0 + β1t + β2(Tt − T.) + β3(Tt − T.)2 + β4Pt + wt



Example from Text

> temp = temp-mean(temp)> temp2 = temp^2> trend = time(mort)> fit = lm(mort~ trend + temp + temp2 + part, na.action=NULL)> summary(fit)

Call:lm(formula = mort ~ trend + temp + temp2 + part, na.action = NULL)

Residuals:Min 1Q Median 3Q Max

-19.0760 -4.2153 -0.4878 3.7435 29.2448

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.831e+03 1.996e+02 14.19 < 2e-16 ***trend -1.396e+00 1.010e-01 -13.82 < 2e-16 ***temp -4.725e-01 3.162e-02 -14.94 < 2e-16 ***temp2 2.259e-02 2.827e-03 7.99 9.26e-15 ***part 2.554e-01 1.886e-02 13.54 < 2e-16 ***---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 6.385 on 503 degrees of freedomMultiple R-Squared: 0.5954, Adjusted R-squared: 0.5922F-statistic: 185 on 4 and 503 DF, p-value: < 2.2e-16



Example from Text

> summary(aov(fit))

Df Sum Sq Mean Sq F value Pr(>F)trend 1 10666.9 10666.9 261.621 < 2.2e-16 ***temp 1 8606.6 8606.6 211.090 < 2.2e-16 ***temp2 1 3428.7 3428.7 84.094 < 2.2e-16 ***part 1 7476.1 7476.1 183.362 < 2.2e-16 ***Residuals 503 20508.4 40.8---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1



Example from Text

> num = length(mort)> AIC(fit)/num - log(2*pi) # AIC

[1] 4.721732

> AIC(fit, k=log(num))/num - log(2*pi) # BIC

[1] 4.771699

> log(sum(resid(fit)^2)/num)+(num+5)/(num-5-2) # AICc

[1] 4.722062

Model RSS AICc1 40,020 5.382 31,413 5.143 27,985 5.034 20,509 4.72



Outline

1 Model Selection


3 Homework 2c



General Principal

If the time series is given asxt = µt + yt

where µt denotes a trend and yt is stationary, then we estimate the trend touncover the underlying stationary process by considering

yt = xt − µt



Detrending Global Temperature

> globtemp = ts(scan("mydata/globtemp.dat"), start=1856)> gtemp=window(globtemp, start=1900)> fit = lm(gtemp~time(gtemp), na.action=NULL)> plot(gtemp, type="o", ylab="Global Temperature Deviation")> abline(fit)

> plot(fit$res)



Global Temperature as a Random Walk

Suppose the mean is stochastic (not fixed as in the regression context). We canmodel the mean as a random walk, i.e.

µt = δ + µt−1 + wt

In this case, differencing xt makes it stationary—

∆xt = xt − xt−1 = (µt + yt)− (µt−1 + yt−1) = δ + wt + yt − yt−1

(You will show this is stationary in your homework.)> plot(diff(gtemp), type="o", main="first difference")



Backshift and Difference Operators

Definition (Backshift Operator)The backshift operator, B, is defined by

Bxt = xt−1

and repeated backshift operations are represented by

Bkxt = BB · · ·B︸︷︷︸k times

xt = xt−k

Definition (Difference Operator)The difference operator, ∆, is defined by

∆xt = (1− B)xt = xt − xt−1

and repeated d times is called differences of order d which represented by

∆d = (1− B)(1− B) · · · (1− B)︸︷︷︸d times

xt



Outline

1 Model Selection


3 Homework 2c



Textbook Reading

Read the following sections from the textbook§2.4 (Smoothing in the Time Series Context)



Textbook Problems

Do the following exercises from the textbook2.6

2.7


sta 6857---model selection & exploratory data analysis (§2 ... · arthur berg sta 6857—model...

Documents