math 344: regression analysis - new jersey institute of ...wguo/math344_2012/math344_chapter...

71
Math 344: Regression Analysis Chapter 1 — Linear Regression with One Predictor Variable Wenge Guo January 19, 2012 Wenge Guo Math 344: Regression Analysis

Upload: others

Post on 01-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Math 344: Regression AnalysisChapter 1 — Linear Regression with One Predictor Variable

Wenge Guo

January 19, 2012

Wenge Guo Math 344: Regression Analysis

Page 2: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Why regression?

I Want to model a functional relationship between an“predictor variable” (input, independent variable, etc.) and a“response variable” (output, dependent variable, etc.)

I Examples?

I But real world is noisy, no f = maI Observation noiseI Process noise

I Two distinct goalsI (Estimation) Understanding the relationship between predictor

variables and response variablesI (Prediction) Predicting the future response given the new

observed predictors.

Page 3: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

History

I Sir Francis Galton, 19th centuryI Studied the relation between heights of parents and children

and noted that the children “regressed” to the populationmean

I “Regression” stuck as the term to describe statistical relationsbetween variables

Page 4: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Example Applications

Trend lines, eg. Google over 6 mo.

Page 5: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Others

I EpidemiologyI Relating lifespan to obesity or smoking habits etc.

I Science and engineeringI Relating physical inputs to physical outputs in complex systems

I Brain

Page 6: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Aims for the course

I Given something you would like to predict and some numberof covariates

I What kind of model should you use?I Which variables should you include?I Which transformations of variables and interaction terms

should you use?

I Given a model and some dataI How do you fit the model to the data?I How do you express confidence in the values of the model

parameters?I How do you regularize the model to avoid over-fitting and

other related issues?

Page 7: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Aims for the course

I Given something you would like to predict and some numberof covariates

I What kind of model should you use?I Which variables should you include?I Which transformations of variables and interaction terms

should you use?

I Given a model and some dataI How do you fit the model to the data?I How do you express confidence in the values of the model

parameters?I How do you regularize the model to avoid over-fitting and

other related issues?

Page 8: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Data for Regression Analysis

I Observational DataExample: relation between age of employee (X ) and numberof days of illness last year (Y )Cannot be controlled!

I Experimental DataExample: an insurance company wishes to study the relationbetween productivity of its analysts in processing claims (Y )and length of training X .

I Treatment: the length of trainingI Experimental Units: the analysts included in the study.

I Completely Randomized Design: Most basic type of statisticaldesignExample: same example, but every experimental unit has anequal chance to receive any one of the treatments.

Page 9: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Data for Regression Analysis

I Observational DataExample: relation between age of employee (X ) and numberof days of illness last year (Y )Cannot be controlled!

I Experimental DataExample: an insurance company wishes to study the relationbetween productivity of its analysts in processing claims (Y )and length of training X .

I Treatment: the length of trainingI Experimental Units: the analysts included in the study.

I Completely Randomized Design: Most basic type of statisticaldesignExample: same example, but every experimental unit has anequal chance to receive any one of the treatments.

Page 10: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Data for Regression Analysis

I Observational DataExample: relation between age of employee (X ) and numberof days of illness last year (Y )Cannot be controlled!

I Experimental DataExample: an insurance company wishes to study the relationbetween productivity of its analysts in processing claims (Y )and length of training X .

I Treatment: the length of trainingI Experimental Units: the analysts included in the study.

I Completely Randomized Design: Most basic type of statisticaldesignExample: same example, but every experimental unit has anequal chance to receive any one of the treatments.

Page 11: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Simple Linear Regression

I Want to find parameters for a function of the form

Yi = β0 + β1Xi + εi

I Distribution of error random variable not specified

Page 12: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Quick Example - Scatter Plot

●●

●●

●●

●●

●●

−2 −1 0 1 2

−2

02

46

8

a

b

Page 13: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Formal Statement of Model

Yi = β0 + β1Xi + εi

I Yi value of the response variable in the i th trial

I β0 and β1 are parameters

I Xi is a known constant, the value of the predictor variable inthe i th trial

I εi is a random error term with mean E(εi ) = 0 and varianceVar(εi ) = σ2

I i = 1, . . . , n

Page 14: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Properties

I The response Yi is the sum of two componentsI Constant term β0 + β1Xi

I Random term εi

I The expected response is

E(Yi ) = E(β0 + β1Xi + εi )

= β0 + β1Xi + E(εi )

= β0 + β1Xi

Page 15: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Expectation Review

I Definition

E(X ) = E(X ) =

∫XP(X )dX , X ∈ R

I Linearity property

E(aX ) = aE(X )

E(aX + bY ) = aE(X ) + b E(Y )

I Obvious from definition

Page 16: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Example Expectation Derivation

P(X ) = 2X , 0 ≤ X ≤ 1

Expectation

E(X ) =

∫ 1

0XP(X )dX

=

∫ 1

02X 2dX

=2X 3

3|10

=2

3

Page 17: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Expectation of a Product of Random Variables

If X,Y are random variables with joint distribution P(X ,Y ) thenthe expectation of the product is given by

E(XY ) =

∫XY

XYP(X ,Y )dXdY .

Page 18: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Expectation of a product of random variables

What if X and Y are independent? If X and Y are independentwith density functions f and g respectively then

E(XY ) =

∫XY

XYf (X )g(Y )dXdY

=

∫X

∫Y

XYf (X )g(Y )dXdY

=

∫X

Xf (X )[

∫Y

Yg(Y )dY ]dX

=

∫X

Xf (X )E(Y )dX

= E(X )E(Y )

Page 19: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Regression Function

I The response Yi comes from a probability distribution withmean

E(Yi ) = β0 + β1Xi

I This means the regression function is

E(Y ) = β0 + β1X

Since the regression function relates the means of theprobability distributions of Y for a given X to the level of X

Page 20: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Error Terms

I The response Yi in the i th trial exceeds or falls short of thevalue of the regression function by the error term amount εi

I The error terms εi are assumed to have constant variance σ2

Page 21: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Response Variance

Responses Yi have the same constant variance

Var(Yi ) = Var(β0 + β1Xi + εi )

= Var(εi )

= σ2

Page 22: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Variance (2nd central moment) Review

I Continuous distribution

Var(X ) = E((X −E(X ))2) =

∫(X −E(X ))2P(X )dX , X ∈ R

I Discrete distribution

Var(X ) = E((X − E(X ))2) =∑i

(Xi − E(X ))2P(Xi ), X ∈ Z

Page 23: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Alternative Form for Variance

Var(X ) = E((X − E(X ))2)

= E((X 2 − 2X E(X ) + E(X )2))

= E(X 2)− 2E(X )E(X ) + E(X )2

= E(X 2)− 2E(X )2 + E(X )2

= E(X 2)− E(X )2.

Page 24: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Example Variance Derivation

P(X ) = 2X , 0 ≤ X ≤ 1

Var(X ) = E((X − E(X ))2) = E(X 2)− E(X )2

=

∫ 1

02XX 2dX − (

2

3)2

=2X 4

4|10 −

4

9

=1

2− 4

9=

1

18

Page 25: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Variance Properties

Var(aX ) = a2 Var(X )

Var(aX + bY ) = a2 Var(X ) + b2 Var(Y ) if X ⊥ Y

Var(a + cX ) = c2 Var(X ) if a and c are both constant

More generally

Var(∑

aiXi ) =∑i

∑j

aiajCov(Xi ,Xj)

Page 26: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Covariance

I The covariance between two real-valued random variables Xand Y, with expected values E(X ) = µ and E(Y ) = ν isdefined as

Cov(X ,Y ) = E((X − µ)(Y − ν))

I Which can be rewritten as

Cov(X ,Y ) = E(XY − νX − µY + µν),

Cov(X ,Y ) = E(XY )− ν E(X )− µE(Y ) + µν,

Cov(X ,Y ) = E(XY )− µν.

Page 27: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Covariance of Independent Variables

If X and Y are independent, then their covariance is zero. Thisfollows because under independence

E(XY ) = E(X )E(Y ) = µν.

and thenCov(X ,Y ) = µν − µν = 0.

Page 28: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Formal Statement of Model

Yi = β0 + β1Xi + εi

I Yi value of the response variable in the i th trial

I β0 and β1 are parameters

I Xi is a known constant, the value of the predictor variable inthe i th trial

I εi is a random error term with mean E(εi ) = 0 and varianceVar(εi ) = σ2

I i = 1, . . . , n

Page 29: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Example

An experimenter gave three subjects a very difficult task. Data onthe age of the subject (X ) and on the number of attempts toaccomplish the task before giving up (Y ) follow:

Table:

Subject i 1 2 3

Age Xi 20 55 30Number of Attempts Yi 5 12 10

Page 30: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Least Squares Linear Regression

I Seek to minimize

Q =n∑

i=1

(Yi − (β0 + β1Xi ))2

I Choose b0 and b1 as estimators for β0 and β1. b0 and b1 willminimize the criterion Q for the given sample observations(X1,Y1), (X2,Y2), · · · , (Xn,Yn).

Page 31: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Guess #1

Page 32: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Guess #2

Page 33: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Function maximization

I Important technique to remember!I Take derivativeI Set result equal to zero and solveI Test second derivative at that point

I Question: does this always give you the maximum?

I Going further: multiple variables, convex optimization

Page 34: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Function Maximization

Find the value of x that maximize the function

f (x) = −x2 + ln(x), x > 0

1. Take derivative

d

dx(−x2 + ln(x)) = 0 (1)

−2x +1

x= 0 (2)

2x2 = 1 (3)

x2 =1

2(4)

x =

√2

2(5)

Page 35: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Function Maximization

Find the value of x that maximize the function

f (x) = −x2 + ln(x), x > 0

1. Take derivative

d

dx(−x2 + ln(x)) = 0 (1)

−2x +1

x= 0 (2)

2x2 = 1 (3)

x2 =1

2(4)

x =

√2

2(5)

Page 36: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

2. Check second order derivative

d2

dx2(−x2 + ln(x)) =

d

dx(−2x +

1

x) (6)

= −2− 1

x2(7)

< 0 (8)

Then we have found the maximum. x∗ =√22 ,

f (x∗) = −12 [1 + log(2)].

Page 37: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Least Squares Max(min)imization

I Function to minimize w.r.t. b0 and b1 – b0 and b1 are calledpoint estimators of β0 and β1 respectively

Q =n∑

i=1

(Yi − (b0 + b1Xi ))2

I Minimize this by maximizing -Q

I Either way, find partials and set both equal to zero

dQ

db0= 0

dQ

db1= 0

Page 38: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Normal Equations

I The result of this maximization step are called the normalequations. b0 and b1 are called point estimators of β0 and β1respectively. ∑

Yi = nb0 + b1

∑Xi∑

XiYi = b0

∑Xi + b1

∑X 2i

I This is a system of two equations and two unknowns. Thesolution is given by. . .

Page 39: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Solution to Normal Equations

After a lot of algebra one arrives at

b1 =

∑(Xi − X )(Yi − Y )∑

(Xi − X )2

b0 = Y − b1X

X =

∑Xi

n

Y =

∑Yi

n

Page 40: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Least Square Fit

Page 41: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Questions to Ask

I Is the relationship really linear?

I What is the distribution of the of “errors”?

I Is the fit good?

I How much of the variability of the response is accounted forby including the predictor variable?

I Is the chosen predictor variable the best one?

Page 42: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Goals for First Half of Course

I How to do linear regressionI Self familiarization with software tools

I How to interpret standard linear regression results

I How to derive tests

I How to assess and address deficiencies in regression models

Page 43: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Estimators for β0, β1, σ2

I We want to establish properties of estimators for β0, β1, andσ2 so that we can construct hypothesis tests and so forth

I We will start by establishing some properties of the regressionsolution.

Page 44: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Properties of Solution

I The i th residual is defined to be

ei = Yi − Yi

I The sum of the residuals is zero:∑i

ei =∑

(Yi − b0 − b1Xi )

=∑

Yi − nb0 − b1

∑Xi

=∑

Yi − n(Y − b1X )− b1

∑Xi

= 0

Page 45: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Properties of Solution

The sum of the observed values Yi equals the sum of the fittedvalues Yi ∑

i

Yi =∑i

(b1Xi + b0)

=∑i

(b1Xi + Y − b1X )

= b1

∑i

Xi + nY − b1nX

= b1nX +∑i

Yi − b1nX

=∑i

Yi

Page 46: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Properties of Solution

The sum of the weighted residuals is zero when the residual in thei th trial is weighted by the level of the predictor variable in the i th

trial ∑i

Xiei =∑

(Xi (Yi − b0 − b1Xi ))

=∑i

XiYi − b0

∑Xi − b1

∑X 2i

= 0

The last step is due to the second normal equation.

Page 47: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Properties of Solution

The sum of the weighted residuals is zero when the residual in thei th trial is weighted by the fitted value of the response variable inthe i th trial ∑

i

Yiei =∑i

(b0 + b1Xi )ei

= b0

∑i

ei + b1

∑i

Xiei

= 0

Page 48: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Properties of Solution

The regression line always goes through the point

X , Y

Using the alternative format of linear regression model:

Yi = β∗0 + β1(Xi − X ) + εi

The least squares estimator b1 for β1 remains the same as before.The least squares estimator for β∗0 = β0 + β1X becomes

b∗0 = b0 + b1X = (Y − b1X ) + b1X = Y

Hence the estimated regression function is

Y = Y + b1(X − X )

Apparently, the regression line always goes through the point(X , Y ).

Page 49: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Estimating Error Term Variance σ2

I Review estimation in non-regression setting.

I Show estimation results for regression setting.

Page 50: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Estimation Review

I An estimator is a rule that tells how to calculate the value ofan estimate based on the measurements contained in a sample

I i.e. the sample mean

Y =1

n

n∑i=1

Yi

Page 51: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Point Estimators and Bias

I Point estimatorθ = f ({Y1, . . . ,Yn})

I Unknown quantity / parameter

θ

I Definition: Bias of estimator

B(θ) = E(θ)− θ

Page 52: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Example

I Samples from a Normal(µ, σ2) distribution

Yi ∼ Normal(µ, σ2)

I Estimate the population mean

θ = µ, θ = Y =1

n

n∑i=1

Yi

Page 53: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Sampling Distribution of the Estimator

I First moment

E(θ) = E(1

n

n∑i=1

Yi )

=1

n

n∑i=1

E(Yi ) =nµ

n= θ

I This is an example of an unbiased estimator

B(θ) = E(θ)− θ = 0

Page 54: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Variance of Estimator

I Definition: Variance of estimator

Var(θ) = E([θ − E(θ)]2)

I Remember:

Var(cY ) = c2 Var(Y )

Var(n∑

i=1

Yi ) =n∑

i=1

Var(Yi )

Only if the Yi are independent with finite variance

Page 55: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Example Estimator Variance

I For N(0, 1) mean estimator

Var(θ) = Var(1

n

n∑i=1

Yi )

=1

n2

n∑i=1

Var(Yi ) =nσ2

n2=σ2

n

I Note assumptions

Page 56: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Bias Variance Trade-off

I The mean squared error of an estimator

MSE (θ) = E([θ − θ]2)

I Can be re-expressed

MSE (θ) = Var(θ) + (B(θ)2)

Page 57: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

MSE = VAR + BIAS2

Proof

MSE (θ)

= E((θ − θ)2)

= E(([θ − E(θ)] + [E(θ)− θ])2)

= E([θ − E(θ)]2) + 2E([E(θ)− θ][θ − E(θ)]) + E([E(θ)− θ]2)

= Var(θ) + 2E([E(θ)[θ − E(θ)]− θ[θ − E(θ)])) + (B(θ))2

= Var(θ) + 2(0 + 0) + (B(θ))2

= Var(θ) + (B(θ))2

Page 58: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Trade-off

I Think of variance as confidence and bias as correctness.I Intuitions (largely) apply

I Sometimes choosing a biased estimator can result in anoverall lower MSE if it exhibits lower variance.

Page 59: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

s2 estimator for σ2 for Single population

Sum of Squares:n∑

i=1

(Yi − Yi )2

Sample Variance Estimator:

s2 =

∑ni=1(Yi − Yi )

2

n − 1

I s2 is an unbiased estimator of σ2.

I The sum of squares SSE has n − 1 “degrees of freedom”associated with it, one degree of freedom is lost by using Y asan estimate of the unknown population mean µ.

Page 60: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Estimating Error Term Variance σ2 for Regression Model

I Regression model

I Variance of each observation Yi is σ2 (the same as for theerror term εi )

I Each Yi comes from a different probability distribution withdifferent means that depend on the level Xi

I The deviation of an observation Yi must be calculated aroundits own estimated mean.

Page 61: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

s2 estimator for σ2

s2 = MSE =SSE

n − 2=

∑(Yi − Yi )

2

n − 2=

∑e2i

n − 2

I MSE is an unbiased estimator of σ2

E(MSE ) = σ2

I The sum of squares SSE has n − 2 “degrees of freedom”associated with it.

I Cochran’s theorem (later in the course) tells us where degree’sof freedom come from and how to calculate them.

Page 62: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Normal Error Regression Model

I No matter how the error terms εi are distributed, the leastsquares method provides unbiased point estimators of β0 andβ1

I that also have minimum variance among all unbiased linearestimators

I To set up interval estimates and make tests we need tospecify the distribution of the εi

I We will assume that the εi are normally distributed.

Page 63: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Normal Error Regression Model

Yi = β0 + β1Xi + εi

I Yi value of the response variable in the i th trial

I β0 and β1 are parameters

I Xi is a known constant, the value of the predictor variable inthe i th trial

I εi ∼iid N(0, σ2)note this is different, now we know the distribution

I i = 1, . . . , n

Page 64: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Notational Convention

I When you see εi ∼iid N(0, σ2)

I It is read as εi is distributed identically and independentlyaccording to a normal distribution with mean 0 and varianceσ2

I ExamplesI θ ∼ Poisson(λ)I z ∼ G (θ)

Page 65: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Maximum Likelihood Principle

The method of maximum likelihood chooses as estimates thosevalues of the parameters that are most consistent with the sampledata.

Page 66: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Likelihood Function

IfXi ∼ F (Θ), i = 1 . . . n

then the likelihood function is

L({Xi}ni=1,Θ) =n∏

i=1

F (Xi ; Θ)

Page 67: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Maximum Likelihood Estimation

I The likelihood function can be maximized w.r.t. theparameter(s) Θ, doing this one can arrive at estimators forparameters as well.

L({Xi}ni=1,Θ) =n∏

i=1

F (Xi ; Θ)

I To do this, find solutions to (analytically or by followinggradient)

dL({Xi}ni=1,Θ)

dΘ= 0

Page 68: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Important Trick

Never (almost) maximize the likelihood function, maximize the loglikelihood function instead.

log(L({Xi}ni=1,Θ)) = log(n∏

i=1

F (Xi ; Θ))

=n∑

i=1

log(F (Xi ; Θ))

Quite often the log of the density is easier to work withmathematically.

Page 69: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

ML Normal Regression

Likelihood function

L(β0, β1, σ2) =

n∏i=1

1

(2πσ2)1/2e−

12σ2 (Yi−β0−β1Xi )

2

=1

(2πσ2)n/2e−

12σ2

∑ni=1(Yi−β0−β1Xi )

2

which if you maximize (how?) w.r.t. to the parameters you get. . .

Page 70: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Maximum Likelihood Estimator(s)

I β0b0 same as in least squares case

I β1b1 same as in least squares case

I σ2

σ2 =

∑i (Yi − Yi )

2

n

I Note that ML estimator is biased as s2 is unbiased and

s2 = MSE =n

n − 2σ2

Page 71: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable

Comments

I Least squares minimizes the squared error between theprediction and the true output

I The normal distribution is fully characterized by its first twocentral moments (mean and variance)