week11 annotated

63
ACTL2002/ACTL5101 Probability and Statistics: Week 11 ACTL2002/ACTL5101 Probability and Statistics c Katja Ignatieva School of Risk and Actuarial Studies Australian School of Business University of New South Wales [email protected] Week 11 Probability: Week 1 Week 2 Week 3 Week 4 Estimation: Week 5 Week 6 Review Hypothesis testing: Week 7 Week 8 Week 9 Linear regression: Week 10 Week 12 Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL Week 5 VL

Upload: bob

Post on 16-Feb-2016

241 views

Category:

Documents


0 download

DESCRIPTION

.

TRANSCRIPT

Page 1: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

ACTL2002/ACTL5101 Probability and Statistics

c© Katja Ignatieva

School of Risk and Actuarial StudiesAustralian School of BusinessUniversity of New South Wales

[email protected]

Week 11Probability: Week 1 Week 2 Week 3 Week 4

Estimation: Week 5 Week 6 Review

Hypothesis testing: Week 7 Week 8 Week 9

Linear regression: Week 10 Week 12

Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL Week 5 VL

Page 2: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Last ten weeks

Introduction to probability;

Moments: (non)-central moments, mean, variance (standarddeviation), skewness & kurtosis;

Special univariate (parametric) distributions (discrete &continue);

Joint distributions;

Convergence; with applications LLN & CLT;

Estimators (MME, MLE, and Bayesian);

Evaluation of estimators;

Interval estimation.3201/3252

Page 3: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Final two weeks

Simple linear regression:- Idea;

- Estimating using LSE (& BLUE estimator & relation MLE);

- Partition of variability of the variable;

- Testing:i) Slope;

ii) Intercept;

iii) Regression line;

iv) Correlation coefficient.

Multiple linear regression:- Matrix notation;

- LSE estimates;

- Tests;

- R-squared and adjusted R-squared.3202/3252

Page 4: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear Algebra and Matrix Approach

Multiple Linear regression

Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models

Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters

Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression

AppendixSimple linear regression in matrix form

Page 5: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear Algebra and Matrix Approach

Linear Algebra and Matrix Approach

In general we will consider multiple regression problem:

y = β0 + β1x1 + β2x2 + . . .+ βp−1xp−1

and data points:

y1 x11 x12 . . . x1,p−1y2 x21 x22 . . . x2,p−1...

......

. . ....

yn xn1 xn2 . . . xn,p−1

3203/3252

Page 6: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear Algebra and Matrix Approach

Multiple Regression: Linear Algebra and Matrix Approach

Observations yi are written in a vector y .

Regression coefficients are the vector (p by 1)β = [β0, β1, . . . , βp−1]> where > indicates transpose (β acolumn vector).

The matrix X (size n by p) is:

X =

1 x11 x12 . . . x1,p−11 x21 x22 . . . x2,p−1...

......

. . ....

1 xn1 xn2 . . . xn,p−1

Predicted values are:

y = Xβ.3204/3252

Page 7: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear Algebra and Matrix Approach

Multiple Regression: Linear Algebra and Matrix Approach

Least squares problem is to select β to minimize:

S(β)

=(y − Xβ

)> (y − Xβ

).

Proof: see next slides.

Differentiate with respect to each of the β′s and the normalequations become:

X>Xβ = X>y .

If X>X is non-singular then the parameter estimates are:

β =(

X>X)−1

X>y .

The residuals are:

ε = y − y = y − Xβ.3205/3252

Page 8: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear Algebra and Matrix Approach

The least squares problem is to find the vector β that minimizes:

S(β)

=n∑

i=1

ε2i =n∑

i=1

(yi − yi )2

=n∑

i=1

(yi − β0 − β1xi1 − . . .− βp−1xip−1)2

=(y − Xβ

)> (y − Xβ

).

Derivation of least squares estimator:

0 =∂

∂β

(y − Xβ

)> (y − Xβ

)=∂

∂β

(y>y − 2

(X>y

)>β + β>X>Xβ

)=− 2X>y + X>Xβ +

(X>X

)>β

=− 2X>y + 2X>Xβ

⇒ X>y =X>Xβ ⇒ β =(

X>X)−1

X>y .3206/3252

Page 9: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear Algebra and Matrix Approach

The Least Squares EstimatesDifferentiating this matrix w.r.t. β and equating equal to zeroleads:

X>Xβ = X>Y ,

i.e., the normal equations. If(X>X

)−1exists, the solution is:

β =(

X>X)−1

X>Y .

The corresponding vector of fitted (or predicted) values of y is:

Y = Xβ

and the vector of residuals:

ε = Y − Y = Y − Xβ

gives the differences between the observed and fitted values.3207/3252

Page 10: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

The Model in Matrix Form

Multiple Linear regression

Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models

Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters

Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression

AppendixSimple linear regression in matrix form

Page 11: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

The Model in Matrix Form

The Model in Matrix Form

Consider the regression model of the form:

y = β0 + β1x1 + . . .+ βp−1xp−1 + ε.

Fitted to data, the model becomes:

yi = β0 + β1xi1 + . . .+ βp−1xip−1 + εi , for i = 1, 2, . . . , n.

Define the vectors:

Y[n×1]

=

y1y2...yn

, β[p×1]

=

β0β1...

βp−1

, and ε[n×1]

=

ε1ε2...εn

.3208/3252

Page 12: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

The Model in Matrix Form

The Model in Matrix Form

Together with the matrix:

X[n×p]

=

1 x11 . . . x1,p−11 x21 . . . x2,p−1...

.... . .

...1 xn1 . . . xn,p−1

.Write the model in matrix form as follows:

Yn×1

= X[n×p]

β[p×1]

+ ε[n×1]

.

The fitted value is:

Y[n×1]

= X[n×p]

β[p×1]

.

3209/3252

Page 13: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear models

Multiple Linear regression

Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models

Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters

Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression

AppendixSimple linear regression in matrix form

Page 14: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear models

Introduction

To apply linear regression properly:

Effects of the covariates (explanatory variables) must beadditive;

Homoskedastic (constant) variance (otherwise useAutoRegressive Conditional Heteroscedasticity model (ARCH)model, from Robert Engle; 2003 Nobel prize for Economics);

Errors must be independent of the explanatory variables withmean zero (weak assumptions);

Errors must be Normally distributed, and hence, symmetric(only in case of testing, i.e., strong assumptions).

3210/3252

Page 15: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear models

Linear models in general

A linear model involves a response variable datum, yi , treatedas an observation on a random variable, (Yi |X = x), whereE[Yi |X = x ] ≡ µi , the εi ’s are zero mean random variablesindependent of X , and the βi ’s are model parameters, thevalues of which are unknown and need to be estimated usingdata.

The following are examples of linear models:

- Affine form: µi = β0 + xiβ1;- Polynomial (cubic) form: µi = β0 + xiβ1 + x2i β2 + x3i β3;- Affine form with interaction terms:µi = β0 + xiβ1 + ziβ2 + (xizi )β3.

For all linear forms we have: Yi = µi + εi .

3211/3252

Page 16: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear models

Linear models

The first model can be re-written in matrix-vector form as:µ1µ2µ3...µn

=

1 x11 x21 x3...

...1 xn

︸ ︷︷ ︸

X

[β0β1

]= [1n X ]β.

So model has general form µ = Xβ, i.e., the expected valuevector µ is given by a model matrix (or design matrix), X,multiplied by a parameter vector, β.

All linear models can be written in this general form.

3212/3252

Page 17: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear models

Linear models

The second model (the cubic) given above can be written inmatrix-vector form as:

µ1µ2µ3...µn

=

1 x1 x21 x311 x2 x22 x321 x3 x23 x33...

......

...1 xn x2n x3n

︸ ︷︷ ︸

X

β0β1β2β3

.

3213/3252

Page 18: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear models

Models in which data are divided into different groups, eachof which are assumed to have a different mean, are lessobviously of the form µ = Xβ, but they can be written likethis using dummy variables.

Consider the model:

yi = βj + εi if observation i is in group j ,

and suppose there are three groups, each with two data. Thenthe model can be re-written:

y1y2y3y4y5y6

=

1 0 01 0 00 1 00 1 00 0 10 0 1

︸ ︷︷ ︸

X

β0β1β2

+ ε.

3214/3252

Page 19: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Matrix notation

Linear models

Marginal effects

Assume that we have the multiple regression model of theform:

y = β0 + β1x1 + . . .+ βp−1xp−1 + ε.

Assume that xk is a continuous variable so that if we increaseit by one unit while holding the values of the other variablesfixed, the value of y becomes:

ynew = β0 + β1x1 + . . .+ βk (xk+1) + . . .+ βp−1xp−1 + ε.

Since E [ε] = 0, then the marginal effect of xk is:

βk = E [ynew ]− E [y ] ,

is therefore the expected increase (or decrease) in the value ofy whenever you increase the value of xk by one unit.

3215/3252

Page 20: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

Statistical Properties of the Least Squares Estimates

Multiple Linear regression

Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models

Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters

Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression

AppendixSimple linear regression in matrix form

Page 21: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

Statistical Properties of the Least Squares Estimates

Assumptions

The residuals terms εi satisfy the following:

E [εi |X = x] = 0, for i = 1, 2, . . . , n;Var (εi |X = x) = σ2, for i = 1, 2, . . . , n;

Cov (εi , εj |X = x) = 0, for all i 6= j .

In words, the residuals have zero means, common variance,are uncorrelated with explanatory variables and areindependent of other residuals.

In matrix form, we have:

E [ε] = 0;

Cov (ε) = σ2In,

where In is a matrix of size n × n with ones on the diagonaland zeros on the off-diagonal elements.

3216/3252

Page 22: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

Statistical Properties of the Least Squares Estimates

Statistical Properties of the Least Squares EstimatesThe following properties of the least squares estimates can beverified:

1. The least squares estimates are unbiased: E[β]

= β.

2. The variance-covariance matrix of the least squares estimates

is: Var(β)

= σ2 ·(X>X

)−1.

3. An unbiased estimate of σ2 is:

s2 =1

n − p

(y − y

)> (y − y

).

Note that:

(n − p) · S2

σ2∼ χ2(n − p),

and β and S2 are independent.3217/3252

Page 23: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

Statistical Properties of the Least Squares Estimates

Statistical Properties of the Least Squares Estimates

4. Each component βk is normally distributed with mean:

E[βk

]= βk ,

and variance:Var

(βk

)= σ2 · ckk ,

where ckk is the (k + 1)th diagonal entry of the matrix

C =(X>X

)−1(because c11 corresponds to the constant) and

covariance between βk and βl :

Cov(βk , βl

)= σ2 · ckl ,

3218/3252

Page 24: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

CI and Tests for Individual Regression Parameters

Multiple Linear regression

Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models

Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters

Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression

AppendixSimple linear regression in matrix form

Page 25: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

CI and Tests for Individual Regression Parameters

CI and Tests for Individual Regression Parameters

The standard error of βk is estimated using:

se(βk

)= s√ckk .

Under the normality (strong) assumption, we have:

βk − βkse(βk

) ∼ t (n − p) .

A 100 (1− α) % confidence interval for βk is given by:

βk ± t1−α/2,n−p · se(βk

).

3219/3252

Page 26: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

CI and Tests for Individual Regression Parameters

CI and Tests for Individual Regression ParametersIn testing the null hypothesis H0 : βk = βk0 for some fixed constantβk0 , we use the test statistic:

T =βk − βk0se(βk

)which under the null hypothesis, it has a t-distribution with n − pdegrees of freedom. The common test is to test the significance ofthe presence of the variable xk , in which case the test statisticsimply becomes:

T =βk

se(βk

) ,because we test H0 : βk = 0 against H1 : βk 6= 0 when we test forthe significance/importance of the variable.

3220/3252

Page 27: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

CI and Tests for Individual Regression Parameters

CI and Tests for Individual Regression ParametersHowever, we can always have more general tests for the regressioncoefficients as demonstrated in the three cases below:

1. Test the null hypothesis:

H0 : βk = βk0

against the alternative:

H1 : βk 6= βk0 .

Use the decision rule (using generalized LRT, week 7):

Reject H0 if: |T | =

∣∣∣∣∣∣ βk − βk0se(βk

)∣∣∣∣∣∣ > t1−α/2,n−p.

3221/3252

Page 28: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

CI and Tests for Individual Regression Parameters

CI and Tests for Individual Regression Parameters

2. Test the hypothesis:

H0 : βk = βk0 v.s. H1 : βk > βk0 .

Use the decision rule (using UMP, week 7):

Reject H0 if: T =βk − βk0se(βk

) > t1−α,n−p.

3. Test the hypothesis:

H0 : βk = βk0 v.s. H1 : βk < βk0 .

Use the decision rule (using UMP, week 7):

Reject H0 if: T =βk − βk0se(βk

) < −t1−α,n−p.

3222/3252

Page 29: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

CI and Tests for functions of Regression Parameters

Multiple Linear regression

Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models

Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters

Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression

AppendixSimple linear regression in matrix form

Page 30: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

CI and Tests for functions of Regression Parameters

CI and Tests for functions of Regression ParametersLet D be a matrix (size m × p) of m linear combinations of theexplanatory variables.Then we have that:

E[Dβ]

=Dβ

Var(

Dβ)

=DVar(β)

D> = σ2D(X>X)−1D>

Under the normality (strong) assumption, we have:

D(β − β)√s2D(X>X)−1D>︸ ︷︷ ︸

=se(Dβ)

∼ t (n − p) .

A 100 (1− α) % confidence interval for Dβ is given by:

Dβ ± t1−α/2,n−p · se(

Dβ).

3223/3252

Page 31: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

CI and Tests for functions of Regression Parameters

Adjusted R-SquaredThe coefficient of determination may is:

R2 =SST− SSE

SST= 1− SSE

SST.

In the simple linear regression model, the R-squared provides adescriptive measure of the success of the regressor variables inexplaining the variation in the dependent variable.

The R-squared will always increase when adding additionalregressor variables increase even if regressor variables addeddo not strongly influence the dependent variable.

An alternative is to correct it for the number of regressorvariables present. Thus, we define adjusted R-squared:

R2a = 1− SSE/ (n − p)

SST/ (n − 1)= 1− s2

MST= 1− n − 1

n − p

(1− R2

).

3224/3252

Page 32: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

CI and Tests for functions of Regression Parameters

Can we test wether the regression explains anythingsignificant? E.g. can we jointly test wether[β1, . . . , βp−1]> = 0 (note: excluding β0)?

Use the F-statistic:

F =|Xβ|2/(p − 1)

|ε|2/(n − p)=

SSM/(p − 1)

SSE/(n − p)∼ Fp−1,n−p.

Under the strong assumptions |Xβ|2/σ2 ∼ χ2p−1 and

|ε|2/σ2 ∼ χ2n−p are chi-squared distributed (note: X is the

matrix X without the constant).

Interpretation: If the regression model explains a largeproportion of the variability in y , then |Xβ|2 should be largeand |ε|2 should be small.

Hence, test H0 : β = 0 v.s. H1 : at least one βk 6= 0.

Reject H0 if F > Fp−1,n−p(1− α).3225/3252

Page 33: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Statistical Properties of the Least Squares Estimates

CI and Tests for functions of Regression Parameters

ANOVA table and sum of squares:- SST is the total variability in the absence of knowledge of the

variables X1, . . . ,Xp−1;- SSE is the total variability remaining after introducing the

effect of X1, . . . ,Xp−1;- SSM is the total variability “explained” because of knowledge

of X1, . . . ,Xp−1.

This partitioning of the variability is used in ANOVA tables:Source Sum of squares Degrees Mean F p-value

of freedom square

Regression SSM=n∑

i=1(yi − y)2 DFM=p − 1 MSM= SSM

DFMMSMMSE 1−

FDFM,DFE(F )

Error SSE=n∑

i=1(yi − yi )

2 DFE=n − p MSE= SSEDFE

Total SST=n∑

i=1(yi − y)2 DFT=n − 1 MST= SST

DFT

3226/3252

Page 34: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example regression output

Multiple Linear regression

Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models

Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters

Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression

AppendixSimple linear regression in matrix form

Page 35: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example regression output

Example regression output (=summary)Error variance and standard deviation

s2: MSE=∑n

i=1 ε2i

n−p CI s2: SSEχ21−α/2(n−p)

SSEχ2α/2

(n−p)

s:√s2 CI s:

√SSE

χ21−α/2(n−p)

√SSE

χ2α/2

(n−p)

ANOVA

Source Sum of squares Degrees Mean F p-valueof freedom square

Regression SSM=n∑

i=1(yi − y)2 DFM=p − 1 MSM= SSM

DFMMSMMSE 1−

FDFM,DFE(F )

Error SSE=n∑

i=1(yi − yi )

2 DFE=n − p MSE= SSEDFE

Total SST=n∑

i=1(yi − y)2 DFT=n − 1 MST= SST

DFT3227/3252

Page 36: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example regression output

Example regression output (cont.) (=summary)

R2: 1− SSESST R:

√R2

R2a : 1− SSE/(n−p)

SST/(n−1) Ra:√

R2a

Coefficients:

β se(β) t p-value CI(β)(X>X

)−1X>y

√Cov(β)kk

β

se(β)1− tn−p(|t|) β − t1−α/2(n − p) · se(β)

β + t1−α/2(n − p) · se(β)

Covariance matrix:

Cov(β) = s2 ·(X>X

)−13228/3252

Page 37: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Exercise: Multiple Linear Regression

Multiple Linear regression

Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models

Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters

Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression

AppendixSimple linear regression in matrix form

Page 38: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Exercise: Multiple Linear Regression

Exercise regression

Given is the following linear regression:

Yi = β0 + β1 · x1i + β2 · x2i + εi

For our sample with 20 observations we have∑20i=1(yi − y)2 = 53.82:

(X>X)−1 =

0.19 −0.08 −0.04−0.08 0.11 −0.03−0.04 −0.03 0.05

β =

0.20.930.95

20∑i=1

ε2i = 11.67

a. Question: What is the estimate of variance of the residual?

b. Question: What is the 95% CI for β1?

c. Question: What is the 95% CI for β1 − β2?

d. Question: Are X1 and X2 jointly significant?3229/3252

Page 39: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Exercise: Multiple Linear Regression

Exercise regressiona. Solution: s2 =

∑20i=1 ε

2i /(n − p) = 11.67/17 = 0.69.

b. Solution: Var(β1) = s2 · c11 = 0.69 · 0.11 = 0.076⇒se(β1) =

√0.076 = 0.276.

F&T page 163: t0.975(17) = 2.110, thus 95% CI for β1 is;

(β1 − t0.975(17) · se(β1), β1 + t0.975(17) · se(β1)) = (0.35, 1.51)

c. Solution: D = [0 1 − 1]; Var(Dβ) = s2 ·D(X>X)−1 ·D> is:

Var(Dβ) =0.69 · [0 1 − 1] ·

0.19 −0.08 −0.04−0.08 0.11 −0.03−0.04 −0.03 0.05

· 0

1−1

=0.69 · [−0.04 0.14 − 0.08] ·

01−1

= 0.69 · 0.22 = 0.151.

3230/3252

Page 40: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Exercise: Multiple Linear Regression

Exercise regression

c. Solution (cont.): se(Dβ) =√Var(Dβ) =

√0.151 = 0.389.

F&T page 163: t0.975(17) = 2.110, thus 95% CI for β1−β2 is;

(β1 − β2 − t0.975(17) · se(Dβ), β1 − β2 + t0.975(17) · se(Dβ))

= (−0.84,0.80)

d. Solution: SST=53.82; SSE=11.67; SSM=42.14;

MSM=42.14/2=21.07; MSE=11.67/17=0.687;F=21.07/0.687=30.68.

F0.01(2, 17) = 6.112, thus X1 and X2 are jointly significanteven for α = 0.01.

3231/3252

Page 41: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Multiple Linear regression

Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models

Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters

Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression

AppendixSimple linear regression in matrix form

Page 42: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

We use a dataset consisting of salaries of football players and someregressor variables that may influence their salaries:

1. SALARY = “player’s salary”;

2. DRAFT = “the round in which player was originally drafted”;

3. YRSEXP = “the player’s experience in years”;

4. PLAYED = “the number of games played in the previous year”;

3232/3252

Page 43: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Regressor variables (cont.):

5. STARTED = “the number of games started in the previousyear”;

6. CITYPOP = “the population of the city in which the player isdomiciled”;

7. OFFBACK = “an indicator of player’s position in the game”(takes value 1 = offback defensive, 0 = others), i.e., it is adummy variable.

3233/3252

Page 44: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Summary Statistics of Variables in the Football Players Salary Data

Count Mean Median Std Dev Minimum Maximum

SALARY 169 336809 265000 255118 75000 1500000

DRAFT 169 6.473 5 4.61 1 13

YRSEXP 169 4.077 4 3.352 0 17

PLAYED 169 10.237 14 6.999 0 16

STARTED 169 5.97 1 6.859 0 16

CITYPOP 169 4980435 2421000 5098109 1176000 18120000

OFFBACK 169 0.2367 0 0.4263 0 1

3234/3252

Page 45: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

The Correlation Matrix

SALARY DRAFT YRSEXP PLAYED STARTED CITYPOP OFFBACK

SALARY

DRAFT -0.454

YRSEXP 0.345 -0.059

PLAYED 0.212 -0.108 0.646

STARTED 0.440 -0.253 0.557 0.633

CITYPOP 0.077 −0.126 0.129 0.193 0.178OFFBACK 0.179 -0.209 -0.050 -0.043 -0.081 -0.067

3235/3252

Page 46: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

3236/3252

Page 47: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

3237/3252

Page 48: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

ANOVA Table

Source Degree of Sum of Mean F-Ratio Prob(> F)freedom Squares Squares

Regression p− 1 SSM MSM=SSM/p− 1 MSM/MSE p-value

Error n− p SSE MSE=SSE/n− p

Total n− 1 SST MST=SST/n− 1

3238/3252

Page 49: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

From this ANOVA table, we can derive several statistics that canbe used to summarise the quality of the regression model. Forexample:

- The coefficient of determination is defined by:

R2 =SSM

SST

and has the interpretation that it gives the proportion of thetotal variability that is explained by the regression equation.

3239/3252

Page 50: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

- The adjusted coefficient of is defined by:

R2a = 1− SSE/ (n − p)

SST/ (n − 1)= 1− s2

S2y

and has the same interpretation as the R-squared, except thatthis is adjusted for the number of regressor variables.

In multiple regression, the R-squared increases as the numberof variables increases, but not necessarily so for adjustedR-squared.

It increases only if an influential variable is added.

3240/3252

Page 51: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

- The size of a typical error, denoted by s, is the square root ofs2 and is also the square root of the error mean square:

s =√s2 =

√MSE =

√SSE

n − p.

It gives the average deviation of the actual y against thatpredicted by the regression equation.

3241/3252

Page 52: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

- The F -ratio defined by:

F -ratio =MSM

MSE,

is the test statistic used for model adequacy.

It provides another indication of how good the model is.

Its corresponding p-value should be as small as possible.

3242/3252

Page 53: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Summary of the results of the regression of the players’ salariesagainst the regressor variables:

Regression Analysis

The regression equation is

SALARY = 361663 - 19139 DRAFT + 21301 YRSEXP - 7948 PLAYED

+ 12965 STARTED - 0.00070 CITYPOP + 82941 OFFBACK

Predictor Coef SE Coef T p

Constant 361663 43734 8.17 0.000

DRAFT -19139 3674 -5.21 0.000

YRSEXP 21301 6370 3.34 0.001

PLAYED -7948 3281 -2.42 0.017

STARTED 12965 3189 4.07 0.000

CITYPOP -0.000699 0.003176 -0.22 0.826

OFFBACK 82941 38241 2.17 0.032

S = 203817 R-sq = 38.5% R-sq(adj) = 36.2%

3243/3252

Page 54: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

ANOVA Table:

Analysis of Variance

SOURCE DF SS MS F p

Regression 6 4.20463E+12 7.00772E+11 16.87 0.000

Error 162 6.72970E+12 41541379329

Total 168 1.09343E+13

3244/3252

Page 55: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

3245/3252

Page 56: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

3246/3252

Page 57: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Improving the Regression Model

Here we give you summary of the results of the improvedregression model:

Regression Analysis

The regression equation is

LOGSAL = 11.8 + 0.0733 YRSEXP - 0.00981 PLAYED + 0.0264 STARTED

+ 0.000000 CITYPOP + 0.187 OFFBACK + 0.933 1/DRAFT

Predictor Coef SE Coef T p

Constant 11.7509 0.0814 144.42 0.000

YRSEXP 0.07332 0.01471 4.98 0.000

PLAYED -0.009815 0.007607 -1.29 0.199

STARTED 0.026380 0.007596 3.47 0.001

CITYPOP 0.00000001 0.00000001 0.70 0.482

OFFBACK 0.18741 0.08691 2.16 0.033

1/DRAFT 0.9334 0.1242 7.52 0.000

S = 0.4713 R-sq = 54.6% R-sq(adj) = 52.9%

3247/3252

Page 58: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

New ANOVA Table:

Analysis of Variance

SOURCE DF SS MS F p

Regression 6 43.3145 7.2191 32.50 0.000

Error 162 35.9891 0.2222

Total 168 79.3035

3248/3252

Page 59: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

3249/3252

Page 60: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Example: Multiple Linear Regression

Example: Multiple Linear Regression

Example: Multiple Linear Regression

3250/3252

Page 61: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Appendix

Simple linear regression in matrix form

Multiple Linear regression

Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models

Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters

Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression

AppendixSimple linear regression in matrix form

Page 62: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Appendix

Simple linear regression in matrix form

For simple linear regression in matrix form we have:

y =

y1y2...yn

X =

1 x11 x2...

...1 xn

.Hence

X>X =

[n

∑ni=1 xi∑n

i=1 xi∑n

i=1 x2i

]and(

X>X)−1

=1

n ·∑n

i=1 x2i − (

∑ni=1 xi )

2︸ ︷︷ ︸=n·

∑ni=1(xi−x)2

[ ∑ni=1 x

2i −

∑ni=1 xi

−∑n

i=1 xi n

].

3251/3252

Page 63: Week11 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 11

Appendix

Simple linear regression in matrix form

Thus:

X>y =

[ ∑ni=1 xi∑n

i=1 xiyi

].

Hence

β =

[β0β1

]=(

X>X)−1 (

X>y)

=1

n ·∑n

i=1(xi − x)2

[ ∑ni=1 x

2i

∑ni=1 yi −

∑ni=1 xi

∑ni=1 xiyi

n∑n

i=1 xiyi −∑n

i=1 xi∑n

i=1 yi

]

3252/3252