univariate linear regression problem model: y= 0 + 1 x+ test: h 0 : β 1 =0. alternative: h 1 :...

35
Univariate Linear Regression Problem Model: Y= 0 + 1 X+ Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. • The distribution of Y is normal under both null and alternative. Under null, var(Y)=σ 0 2 . Under alternative, β 1 >0, and var(Y)=σ 1 2 .

Upload: gervais-shepherd

Post on 04-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Univariate Linear Regression Problem

• Model: Y=0+1X+

• Test: H0: β1=0.

• Alternative: H1: β1>0.

• The distribution of Y is normal under both null and alternative.

• Under null, var(Y)=σ02.

• Under alternative, β1>0, and var(Y)=σ12.

Page 2: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Step 1: Choose the test statistic and specify its null distribution

• Use conditions of the null to find:

).)(

,0(~ˆ

1

2

20

1

n

ini xx

N

Page 3: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Bringing sample size into regression design

• The sample size n is hidden in the regression results. That is, let:

.)( 2

1

2X

n

ini nxx

Page 4: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Step 2: Define the critical value

• For the univariate linear regression test:

.)/(

||0||0 0

2

0

nz

nzCV X

X

Page 5: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Step 3: Define the Rejection Rule

• Each test is a right sided test, and so the rule is to reject when the test statistic is greater than the critical value.

Page 6: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Step 4: Specify the Distribution of Test Statistic under Alternative• Use conditions of the null to find:

)./

,(~ˆ22

111 nEN X

Page 7: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Step 5: Define a Type II Error

• For the univariate linear regression test:

.)/(

||0ˆ 01

nzCV X

Page 8: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Step 6: Find β

• For a univariate linear regression test:

}.)/(

))/(

||0(

)ˆ(

))ˆ(ˆ({Pr

1

10

1

111

n

En

zE

X

X

Page 9: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Basic Insight

• Notice that all three problems have the same basic structure.

• That is, if you understand the solution of the one sample test, then you can derive the answer to the other problems.

Page 10: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Step 7: Phrase requirement on β

• For example, we seek to “choose n so that β=0.01.”

• That is, “choose n so that Pr1{Accept H0}=β=0.01.

Page 11: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Step 7: Phrase requirement on β

• For example, we seek to “choose n so that

.}/

)/

||0(

)ˆ(

))ˆ(ˆ({Pr

1

10

1

111

n

En

zE

X

X

Page 12: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Step 7: Phrase requirement on β

• Notice the parallel phrasing:

.|}|Pr{ zZ

Page 13: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Step 7: Phrase requirement on β

• That is, “choose n so that (note that E0=0):

.||/

)/

||(

1

10

0

z

n

En

zE

X

X

Page 14: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Step 7: Phrase requirement on β

• That is, choose n so that (after algebraic clearing out):

.||||)( 1001

XX

zznEE

Page 15: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Step 8: State the conclusion

• The result for a left sided test has to be worked through but is similar. You must remember to keep all entries positive. This is reasonable if both α and β are constrained to be less than or equal to 0.5. The restriction is not a hardship in practice.

Page 16: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Univariate Linear Regression

• Note that the σ0 factor is changed to σ0/σX.

• There is a similar adjustment for the alternative standard deviation.

Page 17: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Example Problem Group

• Two hundred values of an independent variable xi are chosen so that Σ(xi-xbar)2 is equal to 400,000. For each setting of xi, the random variable Yi=β0+β1xi+σZi is observed. Here β0 and β1 are fixed but unknown parameters, σ=400, and the Zi are independent standard normal random variables.

Page 18: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Example Problem Group

• The null hypothesis to be tested is H0: β1=0, α=0.01, and the alternative is H1: β1<0. The random variable B1 is the OLS estimate of β1.

Page 19: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Example Question 1

• When H0 is true, what is the standard deviation of B1, the OLS estimate of the slope?

• Var(B1)=σ2/Σ(xi-xbar)2=4002/400,000=0.4.

• sd(B1)=0.632.

Page 20: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Example Question 2

• What is the probability of a Type II error in the test specified in the common section using B1, the OLS estimator of the slope, as test statistic when β1=-4, α=0.01, σ=400, and Σ(xi-xbar)2 is equal to 400,000?

Page 21: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Solution to Question 2

• The critical value is 0-2.326(0.632)=-1.47

• A Type II error occurs when B1>-1.47.

• Under alternative B1 is normal with expected value -4 and standard deviation (error) 0.632.

• Pr{B1>-1.47}=Pr{Z>(-1.47-(-4))/0.632} =Pr{Z>4.00}=.000032

• The answer is 0.000032.

Page 22: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Example Question 3

• How many observations n are necessary so that the probability of a Type II error in the test specified in the common section when β1=-4, α=0.01, σ=400, and Σ(xi-xbarn)2 is equal to 2,000n?

Page 23: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Outline of Solution to Problem 3

• For σo term, use (4002/2000)0.5=8.94.

• Use same value for σ1 term.

• Use |z0.01|=2.326.

• Use |E1-E0|=|-4-0|=4.

• Square root of sample size is 10.39.

• Sample size is 109 or more.

Page 24: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Chapter 21: Residual Analysis

• If the assumptions in regression are violated:

– Residuals are one way of checking model:

Ri = Yi - Fitted value at xi

Page 25: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Checking the Assumptions

– Check for normality (test of normality, histogram, q-q plots)

– Check variance if it is the same for all values of the independent variable (plot residuals against predicted values)

– Check independence (plot residuals against sequence variable)

– Check for linearity (plot dependent variable against independent variable)

Page 26: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Residual Plots

• Plot residuals against independent variable.– Plot should be flat indicating the same variance.– There should be no fanning out pattern.– Check for influential observations.

• Plot residuals against predicted variable.– For univariate regression this is the same as the

above plot. There should be no pattern.

Page 27: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

What to do if problem?

• Can look for transformations of either independent or dependent variable or both.

• Using computer this is easy: compute option from menu bar.

Page 28: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Influential Points

• An easier way to look for points that have a large impact on the slope is to plot the change in slope against an arbitrary case sequence number.

Page 29: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Example

• Data set in the web page

• aim: predict final exam score from midterm score

• dependent variable: final exam score

• independent variable: midterm score

• model, check assumptions, predict

Page 30: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

score on first exam

3002001000

final

exa

min

atio

n sc

ore

700

600

500

400

300

200

Page 31: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Output

• Model: Y= 0 + 1 X +

• R2 = 0.508

• F statistics=60.91, Significance=0.01=1.391117, t statistic=7.805,

Significance=0.00=238.95, t statistic=8.329,

Significance=0.0

Page 32: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Predicted Value

600500400300

Res

idua

l200

100

0

-100

-200

Page 33: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Residual

120.0100.0

80.060.0

40.020.0

0.0-20.0

-40.0-60.0

-80.0-100.0

-120.0-140.0

-160.0

14

12

10

8

6

4

2

0

Std. Dev = 66.68 Mean = 0.0N = 61.00

Page 34: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Normal Q-Q Plot of Residual

Observed Value

2001000-100-200

Exp

ecte

d N

orm

al V

alue

3

2

1

0

-1

-2

-3

Page 35: Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both

Next Class

• Multiple Regression!

• Check web site for your data file