chapter 2. simple linear regression - welcome to health...

of 27/27
Simple Linear Regression 1 Chapter 2. Simple Linear Regression Regression Analysis Study a functional relationship between variables - response variable y , Y (dependent variable) - explanatory variable x , X (independent variable) To explain the variabilityof Y ,

Post on 06-Apr-2019

231 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

Simple Linear Regression

1

Chapter 2. Simple Linear Regression

Regression Analysis

Study a functional relationship between variables

- response variable y ,Y (dependent variable) - explanatory variable x , X (independent variable)

To explain the variability of Y ,

Simple Linear Regression

2

Simple linear regression model (2.4)

0 1 ,i i iY x ( 1, , nx x : non-random)

1, , n : independent random errors

2( ) 0, ( )i iE Var ( 1, , )i n

(additional assumption : 2(0, )i N ) inference

Method of estimation (2.5)

- minimize 20 11

( )n

i ii

y x

w.r.t. 0 and 1

- normal equation :

11

1

0 , ( ) (residu

0

)0 al

n

ii

i i i

n

i ii

e

e

x e y x

Simple Linear Regression

3

( ( ) 0i ix x e )

- least square estimates : 1

0 1

xy xxS S

y x

where

2

( )( )

( )xy i i

xx i

S x x y y

S x x

- least squares regression fit : 0 1 1 ( )y x y x x

Simple Linear Regression

4

(2.6)

2 21 ( )

2 i iy y

n

, 0 1 i iy x

12n

SSE (SSE : residual sum of squares (error sum of squares))

( 2n : degrees of freedom, df)

Example (Computer Repair Data, 2.3)

data (n=14)

scatter plot: simple linear regression seems O.K.

model setting : eq. (2.10)

estimated l.s. line eq. (2.19) with residuals (Table 2.7)

estimated error variance: eq. (2.23)

Simple Linear Regression

5

2 21 0 15.509, 4.162, 5.392 4.162 15.509y x

Simple Linear Regression

6

Simple Linear Regression

7

Method of inference ( , )

(1) Properties of estimates

i. 21 1 1 ( ) , ( ) xxE Var S

ii. 2 1 20 0 0 ( ) , ( ) ( )xxE Var n x S

iii. 20 1 ( , ) xxCov x S

iv. 2 2 2 2( ) , ( ) (1 ) , ( , )i ii i j ijE Var e p Cov e e p

and ( ) 0iE e where ( )( )1 i j

ijxx

x x x xp

n S

Simple Linear Regression

8

(2) Inference under additional normality assumption

i. 1 1 1 21 11

~ ( 2); . .( ) var( ). .( )xxt n s e S

s e

- 100 (1 )% C.I. : 1 1 1 1 1 [ ( 2; 2) . .( ) ( 2; 2) . .( )]t n s e t n s e

- Reject 00 1 1:H in favor of 01 1 1:H 0

1 1

1

( 2; 2). .( )

iff t ns e

- p-value

ii. 0 0 1 1/20 00

~ ( 2); . .( ) var( ) ( ). .( )xxt n s e n x S

s e

- 100 (1 )% C.I. : 0 0 0 0 0 [ ( 2; 2) . .( ) ( 2; 2) . .( )]t n s e t n s e

- Reject 00 0 0:H in favor of 01 0 0:H 0

0 0

0

( 2; 2). .( )

iff t ns e

Simple Linear Regression

9

iii. 0 0 0 1 0( | )E Y x x

0 0 1 0 x

0 0 1 2 1/20 0 1 0 0

0

~ ( 2); . .( ) var( ) ( ( ) ). .( ) xxn

t n s e x x x Ss e

- 100 (1 )% C.I. : 0 0 0 0 0 [ ( 2; 2) . .( ) ( 2; 2) . .( )]t n s e t n s e

- Test (not given)

iv. Prediction for 0 0 100 1 0 ( : , , )ny x indep of

0 0 1 0 y x

0 0 1 2 1/20 0 0

0 0

~ ( 2) ; . .( ) (1 ( ) ). .( ) xx

y y t n s e y y n x x Ss e y y

100 (1 )% Prediction interval

: 0 0 0 0 0 0 0 [ ( 2; 2) . .( ) ( 2; 2) . .( )]y t n s e y y y y t n s e y y

** Note that 0 is identical to the predicted response 0y at any given 0x .

Simple Linear Regression

10

Example(computer repair data) (c.t.d.)

Test of significance (of explanatory variable)

0 1: 0H v.s. 1 1: 0H

- 1 1 . .( )t s e 30.71 (Table 2.9)

p-value / meaning : weve seen a data which can hardly be observed under 0H

- We may reject 0H

95% C.I .for 1

95% C.I for 4 0 1 4

95% P.I for 4 0 1 14 ( , , )ny (wider than )

- All these are valid under the model assumptions Need to check them! (chapter 4)

- Note that 0 0: 0H v.s. 1 0: 0H cant be rejected even at 10% (Table 2.9)

Meaning : We may start with a simpler model 1i i iy x 2~ (0, )iid

i N

Then, all the above inferences should be changed!

Simple Linear Regression

11

Measuring the quality of fit

i. Decomposition of Sum of Squares :

deviation sum of squares

( ) ( )i i i iy y y y y y 2 2 2 ( ) ( ) ( )i i i iy y y y y y

SST SSE SSR

(d.f.) (n-1) (n-2) (1)

1 1 11

2( )( ) 2 ( ) 2 ( ) 0 ( )n

i i i i i i i i i ii

y y y y e x x x e x e y y x x

(*) SSR2

22 2 2

11 1

( )( ) ( )n n

ixyi i i

xx xx

x xSy y x x yS S

ii. Coefficient of determination( or Multiple Correlation Coeff.)

2 1SSR SSERSST SST

, 20 1R

2R : proportion of variation of y explained by x

Simple Linear Regression

12

Example (Computer Repair Data)

s.s. d.f. 2RReg. 27419.500 1 0.987

Err. 348.848 12

Total 27768.348 13

Simple Linear Regression

13

Supplement I (ch.2)

(1) Geometry of Least Squares Method

- minimize 20 11

{ ( )}n

i ii

y x

w.r.t. 0 & 1

- minimize 2

0 1( )y 1 x

w.r.t. 0 & 1

where 1

n

y

y

y 1

1

1 1

n

x

x

x column vectors

- Examples

(x,y) = (1,1), (1,2), (2,2) 1=(1,1,1)T, x=(1, 1, 2) T , y=(1, 2, 2) T

Simple Linear Regression

14

( ) y x x ( ) 0T X y X 1 ( )T T X X X y

i.e. 1( )T T X X X X X y : proj of y onto ( )C X ( X column space)

Simple Linear Regression

15

(*1) 1( )T T y 1 1 1 1 y 1

(*2) 1 1( ) ( ) ( ) ( ) ( )T Tx x x x x x 1 x 1 x 1 x 1 y x 1 ( ( ) ( ) 0)Tx y x 1 1

1 0 1 ( )y x y 1 x 1 1 x where 0 1 y x

Simple Linear Regression

16

2( )iy y SST , 2( )i iSSE y y , 2( )iSSR y y

2 2 2cos : cos 1 ( 0)SSR SST R

y gets closer to the plane ( , )C 1 x which is determined by ,1 x

Simple Linear Regression

17

(2) Properties of Variance & Covariance of random variables

cov( , ) ( )( )Y Z E Y EY Z EZ

1 1 1 1

cov( , ) cov( , )m n m n

i i j j i j i ji j i j

a Y b Z a b Y Z

21 11

var( ) var( ) cov( , )n

n n i i i j i ji j

a Y a Y a Y a a Y Y

2

11 1

, : cov( , ) 0

, , : var( ) var( )n n

n i i i i

Y Z indep Y Z

Y Y indep a Y a Y

(3) Expectation, Variance & Covariance of random vectors

For random vector 1, , nY Y Y (column vector notation),

1

n

EYEY

EY

, (mean vector),

1 1 2 1

1 2

var( ) cov( , ) cov( , )

var( ) cov( , )

cov( , ) cov( , ) var( )

n

i j

n n n

Y Y Y Y Y

Y Y Y

Y Y Y Y Y

(variance-covariance matrix)

Simple Linear Regression

18

Note that (*1)

var( ) ( )( )Y E Y EY Y EY

( , )( cov( , ) ( )( ), ( ) )i j i i j j i j i jY Y E Y EY Y EY a a a a

( ) ( )

var( ) var( ) var( )E AY b AE Y b

AY b AY A Y A

(*2)

1

n

ij j ij

AY b a Y b

for ( ), ( )ij iA a b b : constants

1 1

( ) ( )n n

ij j i ij j ij j

E AY b E a Y b a EY b AEY b

var( ) { ( )}{ ( )}AY b E AY b E AY b AY b E AY b

{ ( )}{ ( )}E A Y EY A Y EY

( )( )E A Y EY Y EY A

( )( )AE Y EY Y EY A

var( )A Y A

In simple (or multiple) linear regression model, 2( ) , var( ) nE Y X Y I

Simple Linear Regression

19

(4) Gradient vector

For (n1) vector 1

11

, , ( , , )n

n i ii

n

xc x c x c c c x

x

, partial derivative of c x w.r.t. x :

11

( )

( )

( ) nn

c xcx

c x cx

c x cx

. Similarly, 11

( )

( )

( ) nn

x ccx

x c cx

x c cx

For any matrix 1& ( , , )nA y y y

( )( )

y AyA A y

y

. When A : symmetric, ( )

2y Ay

Ayy

2

1 1 1 1

n n n n

l lk k i ik k l lk ii il k k l

k i l k

y Ay y a y y a y y a a y

Simple Linear Regression

20

( )2ik k l li ii i

k i l ii

y Aya y y a a y

y

1 1

( )n n

ik k l liik l

a y y a A A y

(5) Properties of Least Squares Estimates

iY : Independent, 1, , nx x : constants

20 1 , var( )i i iEY x Y ( 1, , )i n

11 1 1

1 ( )( ) ( ) 0n n n

xy ii i i i

ixx xx xx

S x xx x Y Y Y x x YS S S

0 11

1 ( )n

ii

xx

x xY x x Yn S

0 1 11

1 ( ) ( ) ( )n

ji i i i i i i j

j xx

x xe Y x Y Y x x Y x x Y

n S

i. 1 0 11 1

( ) ( )n n

i ii i

i ixx xx

x x x xE EY xS S

Simple Linear Regression

21

1 ( ( ) ( )( ) )i i i i xxx x x x x x x S 2

21

1

var( ) var( )n

ii xx

i xx

x x Y SS

1( , , nY Y : indep. 2var( ) )iY

ii. 1 10 0 11 1

( ) ( )n n

i ii i

xx xx

x x x xE n x EY n x xS S

0

2 21 2 1 2 2

0 21 1 1 1

0

2 21 2 2 1 2

21

( )var( ) var( ) 2

( )

n n n ni i i

ixx xx xx

n

ixx xx

x x x x x xn x Y n xn xS S S

x xn x x nS S

11 01 1

cov( , ) cov ( ) ,n n

jii j

i jxx xx

x xx xn x Y YS S

1

1 1

cov( , )n n

jii j

i j xx xx

x xx xn x Y YS S

.

Simple Linear Regression

22

1 2

1

2

( cov( , ) 0 )n

i ii j

i xx xx

xx

x x x xn x Y Y for i jS S

x S

Simple Linear Regression

23

Computer Repair Data

1. Input Program

Data repair;

Input units minutes @@;

Cards;

1 23 2 29 3 49 4 64 4 74 5 87 6 96 6 97 7 109 8 119 9 149 9 145 10 154 10 166

;

run;

Simple Linear Regression

24

2. Scatter plot and Linear regression line

symbol1 interpol = RL c=black h=1 v=dot;

axis1 minor=none order=(0,40,80,120,160);

axis2 minor=none order=(0,2,4,6,8,10);

proc gplot data=repair;

plot minutes*units / haxis=axis2 vaxis=axis1;

run;

minutes

0

40

80

120

160

units

0 2 4 6 8 10

Simple Linear Regression

25

3. Regression Analysis

proc reg data=repair

model minutes = units;

run;

proc reg data=repair

model minutes = units /noint;

run;

Simple Linear Regression

26

Simple Linear Regression

27

Anscombes Quartet