statistical view of regression a matlab tutorial

54
Regression and Least Squares: A MATLAB Tutorial Dr. Michael D. Porter [email protected] Department of Statistics North Carolina State University and SAMSI Tuesday May 20, 2008 1 / 54

Upload: neha-kulkarni

Post on 08-Oct-2014

132 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Statistical View of Regression a MATLAB Tutorial

Regression and Least Squares: A MATLABTutorial

Dr. Michael D. [email protected]

Department of StatisticsNorth Carolina State University

andSAMSI

Tuesday May 20, 2008

1 / 54

Page 2: Statistical View of Regression a MATLAB Tutorial

Introduction to Regression

Goal: Express the relationship between two (or more)variables by a mathematical formula.

x is the predictor (independent) variabley is the response (dependent) variable

We specifically want to indicate how y varies as a functionof x.

y(x) is considered a random variable, so it can never bepredicted perfectly.

2 / 54

Page 3: Statistical View of Regression a MATLAB Tutorial

Example: Relating Shoe Size to HeightThe problem

Footwear impressions are commonly observed at crimescenes. While there are numerous forensic properties that canbe obtained from these impressions, one in particular is theshoe size. The detectives would like to be able to estimate theheight of the impression maker from the shoe size.

3 / 54

Page 4: Statistical View of Regression a MATLAB Tutorial

Example: Relating Shoe Size to HeightThe data

6 7 8 9 10 11 12 13 14 1560

62

64

66

68

70

72

74

76

Determining Height from Shoe Size

Shoe Size (Mens)

Hei

gh

t (i

n)

Data taken from: http://staff.imsa.edu/∼brazzle/E2Kcurr/Forensic/Tracks/TracksSummary.html

4 / 54

Page 5: Statistical View of Regression a MATLAB Tutorial

Example: Relating Shoe Size to HeightYour answers

6 7 8 9 10 11 12 13 14 1560

62

64

66

68

70

72

74

76

Determining Height from Shoe Size

Shoe Size (Mens)

Hei

gh

t (i

n)

1 What is the predictor?What is the response?

5 / 54

Page 6: Statistical View of Regression a MATLAB Tutorial

Example: Relating Shoe Size to HeightYour answers

6 7 8 9 10 11 12 13 14 1560

62

64

66

68

70

72

74

76

Determining Height from Shoe Size

Shoe Size (Mens)

Hei

gh

t (i

n)

1 What is the predictor?What is the response?

2 Can the height of theimpression maker beaccurately estimated fromthe shoe size?

6 / 54

Page 7: Statistical View of Regression a MATLAB Tutorial

Example: Relating Shoe Size to HeightYour answers

6 7 8 9 10 11 12 13 14 1560

62

64

66

68

70

72

74

76

Determining Height from Shoe Size

Shoe Size (Mens)

Hei

gh

t (i

n)

1 What is the predictor?What is the response?

2 Can the height of theimpression maker beaccurately estimated fromthe shoe size?

3 If a shoe is size 11, whatwould you advise thepolice?

7 / 54

Page 8: Statistical View of Regression a MATLAB Tutorial

Example: Relating Shoe Size to HeightYour answers

6 7 8 9 10 11 12 13 14 1560

62

64

66

68

70

72

74

76

Determining Height from Shoe Size

Shoe Size (Mens)

Hei

gh

t (i

n)

1 What is the predictor?What is the response?

2 Can the height of theimpression maker beaccurately estimated fromthe shoe size?

3 If a shoe is size 11, whatwould you advise thepolice?

4 What if the size is 7? Size12.5?

8 / 54

Page 9: Statistical View of Regression a MATLAB Tutorial

General Regression Model

Assume the true model is of the form:

y(x) = m(x) + ǫ(x)

The systematic part, m(x) is deterministicThe error, ǫ(x) is a random variable

Measurement errorNatural variations due to exogenous factorsTherefore, y(x) is also a random variable

The error is additive

9 / 54

Page 10: Statistical View of Regression a MATLAB Tutorial

Example: Sinusoid Function�

�y(x) = A · sin(ωx + φ) + ǫ(x)

A = 1; ω = π/2; φ = π; σ = 0.5

0 1 2 3 4 5 6 7 8 9 10−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

y(x)

y(x)m(x)

Amplitude A

Angularfrequency ω

Phase φ

Random errorǫ(x) ∼ N(0, σ2)

10 / 54

Page 11: Statistical View of Regression a MATLAB Tutorial

Regression Modeling

We want to estimate m(x) and possibly the distribution of ǫ(x)

There are two general situations:

Theoretical Modelsm(x) is of some known (or hypothesized) form but withsome parameters unknown. (e.g. Sinusoid Function withA, ω, φ unknown)

Empirical Modelsm(x) is constructed from the observed data (e.g. Shoe sizeand height)

We often end up using both: constructing models from theobserved data and prior knowledge.

11 / 54

Page 12: Statistical View of Regression a MATLAB Tutorial

The Standard Assumptions

�y(x) = m(x) + ǫ(x)

A1: E[ǫ(x)] = 0 ∀x

A2: Var[ǫ(x)] = σ2 ∀x

A3: Cov[ǫ(x), ǫ(x′)] = 0 ∀x 6= x′

(Mean 0)

(Homoskedastic)

(Uncorrelated)

These assumptions are only on the error term.

ǫ(x) = y(x) − m(x)

12 / 54

Page 13: Statistical View of Regression a MATLAB Tutorial

Residuals

The residualse(xi) = y(xi) − m(xi)

can be used to check the estimated model, m(x).

If the model fit is good, the residuals should satisfy ourthree assumptions.

13 / 54

Page 14: Statistical View of Regression a MATLAB Tutorial

A1 - Mean 0

Violates A1

0 0.2 0.4 0.6 0.8 1−10

−8

−6

−4

−2

0

2

4

6

8

10

x

e(x)

Satisfies A1

0 0.2 0.4 0.6 0.8 1−3

−2

−1

0

1

2

3

x

e(x)

14 / 54

Page 15: Statistical View of Regression a MATLAB Tutorial

A2 - Constant Variance

Violates A2

0 0.2 0.4 0.6 0.8 1−30

−20

−10

0

10

20

30

x

e(x)

Satisfies A2

0 0.2 0.4 0.6 0.8 1−3

−2

−1

0

1

2

3

x

e(x)

15 / 54

Page 16: Statistical View of Regression a MATLAB Tutorial

A3 - Uncorrelated

Violates A3

0 0.2 0.4 0.6 0.8 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

x

e(x)

Satisfies A3

0 0.2 0.4 0.6 0.8 1−3

−2

−1

0

1

2

3

x

e(x)

16 / 54

Page 17: Statistical View of Regression a MATLAB Tutorial

Back to the Shoes

How can we estimate m(x) for the shoe example?

(Non-parametric): For each shoe size, take the mean ofthe observed heights.(Parametric): Assume the trend is linear.

6 7 8 9 10 11 12 13 14 15

60

62

64

66

68

70

72

74

76

Determining Height from Shoe Size

Shoe Size (Mens)

Hei

gh

t (i

n)

Local MeanLinear Trend

17 / 54

Page 18: Statistical View of Regression a MATLAB Tutorial

Simple Linear Regression

Simple linear regression assumes that m(x) is of the parametricform

m(x) = β0 + β1x

which is the equation for a line.

18 / 54

Page 19: Statistical View of Regression a MATLAB Tutorial

Simple Linear Regression

Which line is the best estimate?

6 7 8 9 10 11 12 13 14 15

60

62

64

66

68

70

72

74

76

Determining Height from Shoe Size

Shoe Size (Mens)

Hei

gh

t (i

n)

Line #1Line #2Line #3

m(x) = β0 + β1x

β0 β1

Line #1 48.6 1.9Line #2 51.5 1.6Line #3 45.0 2.3

19 / 54

Page 20: Statistical View of Regression a MATLAB Tutorial

Estimating Parameters in Linear RegressionData

Write the observed data:

yi = β0 + β1xi + ǫi i = 1, 2, . . . , n

where

yi ≡ y(xi) is the response value for observation i

β0 and β1 are the unknown parameters (regressioncoefficients)

xi is the predictor value for observation i

ǫi ≡ ǫ(xi) is the random error for observation i

20 / 54

Page 21: Statistical View of Regression a MATLAB Tutorial

Estimating Parameters in Linear RegressionStatistical Decision Theory

Let g(x) ≡ g(x; β) be an estimator for y(x)

Define a Loss Function, L(y(x), g(x)) which describes howfar g(x) is from y(x)

Example

Squared Error Loss

L(y(x), g(x)) = (y(x) − g(x))2

The best predictor minimizes the Risk (or expected Loss)

R(x) = E[L(y(x), g(x))]

g∗(x) = arg ming∈G

E[L(y(x), g(x))]

21 / 54

Page 22: Statistical View of Regression a MATLAB Tutorial

Estimating Parameters in Linear RegressionMethod of Least Squares

If we assume a squared error loss function

L(yi, mi) = (yi − (β0 + β1xi))2

An approximation to the Risk function is the Sum of SquaredErrors (SSE):

R(β0, β1) =n∑

i=1

(yi − (β0 + β1xi))2

Then it makes sense to estimate (β0, β1) as the values thatminimize R(β0, β1)

(β0, β1) = arg minB0,B1

R(β0, β1)

22 / 54

Page 23: Statistical View of Regression a MATLAB Tutorial

Estimating Parameters in Linear RegressionDerivation of Linear Least Squares Solution

R(β0, β1) =n∑

i=1

(yi − (β0 + β1xi))2

Differentiate the Risk function with respect to the unknownparameters and equate to 0

∂R∂β0

∣∣∣∣=0

= −2n∑

i=1

(yi − (β0 + β1xi)) = 0

∂R∂β1

∣∣∣∣=0

= −2n∑

i=1

xi (yi − (β0 + β1xi)) = 0

23 / 54

Page 24: Statistical View of Regression a MATLAB Tutorial

Estimating Parameters in Linear RegressionLinear Least Squares Solution

R(β0, β1) =n∑

i=1

(yi − (β0 + β1xi))2

The least square estimates are

β1 =

∑ni=1 xiyi − nxy∑ni=1 x2

i − nx2

β0 = y − β1x

where x and y are the sample means of the xi’s and yi’s.

24 / 54

Page 25: Statistical View of Regression a MATLAB Tutorial

And the winner is ...

Line # 2!

6 7 8 9 10 11 12 13 14 15

60

62

64

66

68

70

72

74

76

Determining Height from Shoe Size

Shoe Size (Mens)

Hei

gh

t (i

n)

Line #1Line #2Line #3

For these data:x = 11.03 y = 69.31

β0 = 51.46

β1 = 1.62

25 / 54

Page 26: Statistical View of Regression a MATLAB Tutorial

Residuals

The fitted value, yi for the ith observation is

yi = β0 + β1xi

The residual, ei is the difference between the observed andfitted value

ei = yi − yi

The residuals are used to check if our three assumptionsappear valid

26 / 54

Page 27: Statistical View of Regression a MATLAB Tutorial

Residuals for shoe size data

6 7 8 9 10 11 12 13 14 15−5

−4

−3

−2

−1

0

1

2

3

4

5Determining Height from Shoe Size

Shoe Size (Mens)

resi

du

al

Residuals

27 / 54

Page 28: Statistical View of Regression a MATLAB Tutorial

Example of poor fit

Scatter Plot

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

7

8

9

x

y(x)

Residual Plot

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−4

−3

−2

−1

0

1

2

3

4

xe(

x)

28 / 54

Page 29: Statistical View of Regression a MATLAB Tutorial

Adding Polynomial Terms in the Linear Model

Modeling the mean trend as a line doesn’t seem to fitextremely well in the above example. There is a systematiclack of fit.

Consider a polynomial form for the mean

m(x) = β0 + β1x + β2x2 + . . . + βpxp

=

p∑

k=0

βkxk

This is still considered a linear modelm(x) is a linear combination of the βk

Danger of over-fitting

29 / 54

Page 30: Statistical View of Regression a MATLAB Tutorial

Quadratic Fit: y(x) = β0 + β1x + β2x2 + ǫ(x)

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

0

1

2

3

4

5

6

7

8

9Scatter Plot

x

y(x)

1st OrderQuadratic

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

7

8

9

x

y(x)

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−4

−3

−2

−1

0

1

2

3

4Residual Plot (Quadratic Fit)

x

e(x)

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−4

−3

−2

−1

0

1

2

3

4

x

e(x)

30 / 54

Page 31: Statistical View of Regression a MATLAB Tutorial

Matrix Approach to Linear Least SquaresSetup

Previously, we wrote our data as yi =∑p

k=0 βkxki + ǫi. In matrix

notation this becomes

Y = Xβ + ǫ

Y =

y1

y2...

yn

, X =

1 x1 x21 . . . xp

11 x2 x2

2 . . . xp2

......

.... . .

...1 xn x2

n . . . xpn

, β =

β0

β1...

βp

, ǫ =

ǫ1

ǫ2...ǫn

How many unknown parameters are in the model?

31 / 54

Page 32: Statistical View of Regression a MATLAB Tutorial

Matrix Approach to Linear Least SquaresSolution

To minimize SSE (Sum of Squared Errors), use Riskfunction

R(β) = (Y − Xβ)T(Y − Xβ)

Taking derivative w.r.t β gives the Normal Equations

XTXβ = XTY

The least squares solution for β is ...Hint: See “Linear Inverse Problems: A MATLAB Tutorial” by Qin Zhang

32 / 54

Page 33: Statistical View of Regression a MATLAB Tutorial

Matrix Approach to Linear Least SquaresSolution

To minimize SSE (Sum of Squared Errors), use Riskfunction

R(β) = (Y − Xβ)T(Y − Xβ)

Taking derivative w.r.t β gives the Normal Equations

XTXβ = XTY

The least squares solution for β is ...Hint: See “Linear Inverse Problems: A MATLAB Tutorial” by Qin Zhang

β = (XTX)−1XTY

33 / 54

Page 34: Statistical View of Regression a MATLAB Tutorial

STRETCH BREAK!!!

34 / 54

Page 35: Statistical View of Regression a MATLAB Tutorial

MATLAB DemonstrationLinear Least Squares

MATLAB Demo #1Open Regression_Intro.m

35 / 54

Page 36: Statistical View of Regression a MATLAB Tutorial

Model Selection

How can we compare and select a final model?

How many terms should be include in polynomial models?

What is the danger of over-fitting? (Including too manyterms)

What is the problem with under-fitting? (Not includingenough terms)

36 / 54

Page 37: Statistical View of Regression a MATLAB Tutorial

Estimating Variance

Recall assumptions A1, A2, and A3: Assumptions

For our fitted model, the residuals ei = yi − yi can be usedto estimate Var[ǫ(x)].

An estimator for the variance is ...Hint: See “Basic Statistical Concepts and Some Probability Essentials” by

Justin Shows and Betsy Enstrom

37 / 54

Page 38: Statistical View of Regression a MATLAB Tutorial

Estimating Variance

Recall assumptions A1, A2, and A3: Assumptions

For our fitted model, the residuals ei = yi − yi can be usedto estimate Var[ǫ(x)].

An estimator for the variance is ...Hint: See “Basic Statistical Concepts and Some Probability Essentials” by

Justin Shows and Betsy Enstrom

The Sample Variance

s2z =

1n − 1

n∑

i=1

(zi − z)2

38 / 54

Page 39: Statistical View of Regression a MATLAB Tutorial

Estimating Variance

Sample Variance for a rv z

s2z =

1n − 1

n∑

i=1

(zi − z)2

The estimator for the regression problem is similar

σ2ǫ

=1

n − (p + 1)

n∑

i=1

e2i

=SSE

df

where the degrees of freedom df = n − (p + 1). There arep + 1 unknown parameters in the model.

39 / 54

Page 40: Statistical View of Regression a MATLAB Tutorial

Statistical InferenceAn additional assumption

In order to calculate confidence intervals (C.I.), we need adistributional assumption on ǫ(x).

Up to now, we haven’t needed one

The standard assumption is to assume a Normal orGaussian distribution

A4 : ǫ(x) ∼ N (0, σ2)

40 / 54

Page 41: Statistical View of Regression a MATLAB Tutorial

Statistical InferenceDistributions

Using

y(xo) = xT0 β + ǫ(xo)

y(xo) ∼ N (xT0 β, σ2)

β = (XTX)−1XTY

where x0 is a point in design space.

And the 4 assumptions, we find

m(xo) = N(xT

o β, σ2xTo (XTX)−1xo

)

y(xo) = N(xT

o β, σ2(1 + xTo (XTX)−1xo)

)

β ∼ MVN(Xβ, σ2(XTX)−1)

From these we can find CI’s and perform hypothesis tests.

41 / 54

Page 42: Statistical View of Regression a MATLAB Tutorial

Model ComparisonR2

Sum of Squares Error

SSE =n∑

i=1

(yi − yi)2 =

n∑

i=1

e2i = e′e

Sum of Squares Total

SST =

n∑

i=1

(yi − y)2

This is the model with intercept only y(x) = y.

Coefficient of Determination

R2 = 1 −SSE

SST

R2 is a measure of how much better a regression model isthan the intercept only.

42 / 54

Page 43: Statistical View of Regression a MATLAB Tutorial

Model ComparisonAdjusted R2

What happens to R2 if you add more terms in the model?

R2 = 1 −SSE

SST

43 / 54

Page 44: Statistical View of Regression a MATLAB Tutorial

Model ComparisonAdjusted R2

What happens to R2 if you add more terms in the model?

R2 = 1 −SSE

SST

Adjusted R2 penalizes by the number of terms (p + 1) in themodel

R2adj = 1 −

SSE/(n − (p + 1))

SST/(n − 1)

= 1 −σǫ

SST/(n − 1)

Also see residual plots, Mallow’s Cp, PRESS(cross-validation), AIC, etc.

44 / 54

Page 45: Statistical View of Regression a MATLAB Tutorial

MATLAB Demonstrationcftool

MATLAB Demo #2Type cftool

45 / 54

Page 46: Statistical View of Regression a MATLAB Tutorial

Nonlinear Regression

A linear regression model can be written

y(x) =

p∑

k=0

βkhk(x) + ǫ(x)

The mean, m(x) is a linear combination of the β’sNonlinear regression takes the general form

y(x) = m(x; β) + ǫ(x)

for some specified function m(x; β) with unknownparameters β.

46 / 54

Page 47: Statistical View of Regression a MATLAB Tutorial

Nonlinear Regression

A linear regression model can be written

y(x) =

p∑

k=0

βkhk(x) + ǫ(x)

The mean, m(x) is a linear combination of the β’sNonlinear regression takes the general form

y(x) = m(x; β) + ǫ(x)

for some specified function m(x; β) with unknownparameters β.

Example

The sinusoid we looked at earlier

y(x) = A · sin(ωx + φ) + ǫ(x)

with parameters β = (A, ω, φ) is a nonlinear model.

47 / 54

Page 48: Statistical View of Regression a MATLAB Tutorial

Nonlinear RegressionParameter Estimation

Making same assumptions as in linear regression (A1-A3),the least squares solution is still valid

β = arg minn∑

i=1

(yi − m(xi; β))2

Unfortunately, this usually doesn’t have a closed formsolution (like in the linear case)

Approaches to finding the solution will be discussed later inthe workshop

But that won’t stop us from using nonlinear (andnonparametric) regression in MATLAB!

48 / 54

Page 49: Statistical View of Regression a MATLAB Tutorial

Off again to cftool

MATLAB Demo #3

49 / 54

Page 50: Statistical View of Regression a MATLAB Tutorial

Weighted Regression

Consider the risk functions we have considered so far

R(β) =n∑

i=1

(yi − m(xi; β))2

Each observation is equally contributes to the riskWeighted regression uses the risk function

Rw(β) =

n∑

i=1

wi (yi − m(xi; β))2

so observations with larger weights are more important.Some examples

wi = 1/σ2i Heteroskedastic (Non-constant variance)

wi = 1/xi

wi = 1/yi

wi = k/|ei| Robust Regression

50 / 54

Page 51: Statistical View of Regression a MATLAB Tutorial

Transformations

Sometimes transformations are used to obtain bettermodels

Transform predictors x → x′

Transform response y → y′

Make sure assumptions A1-A3,A4 are still valid

Standardizedx′ =

x − xsx

Logy′ = log(y)

51 / 54

Page 52: Statistical View of Regression a MATLAB Tutorial

The Competition

Contest to see who can construct the best model in cftool

Get into groups

Data can be found in competition data.m

Scoring will be performed on testing set

Want to minimize sum of squared errors

When group is ready, enter model into this computer

52 / 54

Page 53: Statistical View of Regression a MATLAB Tutorial

MATLAB Help

There is lots of good assistance in the MATLAB helpwindow

Specifically, look at the Demos tab on the help window

The Toolboxes of Statistics (Regression) and Optimizationmay be particularly useful for this workshop

53 / 54

Page 54: Statistical View of Regression a MATLAB Tutorial

Have a great workshop!

54 / 54