lecture 1a: linear regression with one predictor variable

23
Lecture 1a: Linear regression with one predictor variable 1 732G21/732A35/732G28

Upload: uzuri

Post on 15-Feb-2016

80 views

Category:

Documents


1 download

DESCRIPTION

Lecture 1a: Linear regression with one predictor variable. Course structure. 732G21 Sambandsmodeller http://www.ida.liu.se/~732G21 One semester= Regr.analysis + + analysis of variance (teacher: Lotta Hallberg) 732G28 Regression methods http://www.ida.liu.se/~ 732G28 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 1

Lecture 1a:

Linear regression with one predictor variable

Page 2: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 2

732G21 Sambandsmodellerhttp://www.ida.liu.se/~732G21

One semester=Regr.analysis+ + analysis of variance (teacher: Lotta Hallberg)

732G28 Regression methodshttp://www.ida.liu.se/~732G28

Half of semester=Regr. analysis

732A35 Linear statistical modelshttp://www.ida.liu.se/~732A22

Almost one semester=Regr. Analysis++ analysis of variance (teacher: Lotta Hallberg)

Course structure

Page 3: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 3

Course language: English, but you may use Swedish

We use It’s learning (accessed via Student portal) (show…)

9 Lectures

8 Labs (computer). Deadlines, around 5 days after lab ends

8 Lessons=I solve problems on the whiteboard + lab discussion

One written final exam

Course book: Kutner, M.H., Nachtsheim, C.J., Neter, J. and Li, W. Applied Linear Statistical Models with Student Data CD, 5th Edition, ISBN 0073108742.

Course structure (regression part)

Page 4: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 4

Linear statistical models are widely used in◦ Business◦ Economics◦ Engineering◦ Social, biological sciences◦ Etc

Example:A database contains price of houses sold in Linköping in 2009,

their age, size, other parameters.◦ Given parameters of a new house

determine its approximate market price Determine reasonable price bounds

Regression analysis

Page 5: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 5

Analysis of databases

Observations (records, cases) in rows

Variables in columns◦ Explanatory variables (predictors, inputs) Xi

◦ Response Y, we assume Y=f(X1,…,Xn)

In this lecture, models with only one explanatory variable

What we analyse

No Area (X1) Age (X2) Price (Y)1 320 14 2,530,0002 210 1 1,800,000… … … …

Page 6: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 6

Real data can seldom be presented as Y=βX (observation errors, missing inputs etc)

Statistical relation and functional relation

Example: Age and salary for a sample of eight persons from a company.

Age Salary

21 1732 3040 2756 3561 4455 3839 3633 25

0

5

10

15

20

25

30

35

40

45

50

0 10 20 30 40 50 60 70

Age (x)

Sala

ry (y

)

Scatterplot

Page 7: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 7

Presented relation is almost linear Linear regression analysis: find a linear finction as close as

possible to the data

Statistical relation and functional relation

y = 0.5471x + 8.4545

0

5

10

15

20

25

30

35

40

45

50

0 10 20 30 40 50 60 70

Age (x)

Sala

ry (y

)

Page 8: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 8

For each X, there is a probability distribution P(Y=y|X=x) of Y

The aim is to find a regression function E(Y|X=x)

Regression models

Page 9: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 9

Construction of regression models

Selection of prediction variables (variance reduction) Functional form (from theory, approximation) Domain of the model

Software MINITAB SAS SPSS Matlab Excel

Regression models

Page 10: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 10

Formal statement

Yi is i th response value β0 β1 model parameters, regression parameters (intercept,

slope) Xi is i th predictor value is i.i.d. random vars with expectation zero and variance

σ2

Simple linear model

ii XY 110

i

Page 11: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 11

Features (show…)

All Yi and Yj are uncorrelated

Meaning of regression parameters β0 response value at X=0 β1 change in EY per unit increase in X

Simple linear model

ii XYE 10

22 iY

Page 12: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 12

Given data set

Method of least squares:

Observed response Yi Estimated response Deviation

Regression fit is good when all deviations are minimized (see pict) -> minimimize sum of squares

Estimation of regression function

iX10 ii XY 10

n

iii XYQ

1

210

nn YXYXS ,,...,, 11

Page 13: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 13

How to find minimum of Q?

Estimators of β0 and β1

Estimation of regression function

0

0

1

0

Q

Q

XbYb

XX

YYXXb n

ii

n

iii

10

1

2

11

Page 14: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 14

Exercise (For salary data, MINITAB):

1. Make scatterplot (Scatterplot…, with, without regression lien)

2. Perform regression using ”Regression…”3. Perform regression using ”Fitted line plot..”4. Calculate coefficients by hand

Estimation of regression function

Page 15: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 15

Estimation of regression function

y = 0.5471x + 8.4545

0

5

10

15

20

25

30

35

40

45

50

0 10 20 30 40 50 60 70

Age (x)

Sala

ry (y

)

Page 16: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 16

Gauss-Markov theorem

Estimators b0 and b1 are unbiased and have minimum variance among all unbiased estimators

Unbiased bias=Eb0-β0=0 Eb0=β0 Analogously, Eb1=β1

Show illustration…

Estimation of regression function

Page 17: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 17

Mean (expected response)

Point estimator of mean response (fitted value)

Residuals

Estimation of regression function

X10

XbbY 10ˆ

iii YYe ˆ

Page 18: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 18

Plot of residuals (obtain it with MINITAB)

Estimation of regression function

-6

-4

-2

0

2

4

6

8

0 10 20 30 40 50 60 70

Age

Resi

dual

s

Page 19: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 19

Properties of residuals

1. (because )

2. is minimum possible

3. (because of 1)

4. , (can be shown)

5. Regression line always goes through

Estimation of regression function

01

n

iie 0

0

Q

n

iie

1

2

n

ii

n

ii YY

11

ˆ

01

n

iiieX 0ˆ

1

n

iiieY

YX ,

Page 20: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 20

Estimate of variance of single population (sample variance)

In regression, we compute s2 using residuals (look at residual plot)

Estimation of error term variance

n

ii YY

ns

1

22

11

n

ii

n

iii eYYSSE

1

2

1

22

nSSEMSEs

Page 21: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 21

Why divided by n-2? Because E(MSE)=σ2

Important: In general, unbiased

d - degrees of freedom, number of model parameteres

Example: Compute residuals, SSE, MSE, find it in MINITAB output

Estimation of error term variance

dnSSEMSEs

2

Page 22: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 22

Minitab◦ Graph → Scatterplot◦ Stat → Regression◦ Stat->Fitted Line Plot

Simple regression using software

Page 23: Lecture 1a: Linear regression with  one predictor variable

732G21/732A35/732G28 23

Course book, Ch. 1 up to page 27.

Reading