univariate linear regression problem model: y= 0 + 1 x+ test: h 0 : β 1 =0. alternative: h 1 :...
TRANSCRIPT
Univariate Linear Regression Problem
• Model: Y=0+1X+
• Test: H0: β1=0.
• Alternative: H1: β1>0.
• The distribution of Y is normal under both null and alternative.
• Under null, var(Y)=σ02.
• Under alternative, β1>0, and var(Y)=σ12.
Step 1: Choose the test statistic and specify its null distribution
• Use conditions of the null to find:
).)(
,0(~ˆ
1
2
20
1
n
ini xx
N
Bringing sample size into regression design
• The sample size n is hidden in the regression results. That is, let:
.)( 2
1
2X
n
ini nxx
Step 2: Define the critical value
• For the univariate linear regression test:
.)/(
||0||0 0
2
0
nz
nzCV X
X
Step 3: Define the Rejection Rule
• Each test is a right sided test, and so the rule is to reject when the test statistic is greater than the critical value.
Step 4: Specify the Distribution of Test Statistic under Alternative• Use conditions of the null to find:
)./
,(~ˆ22
111 nEN X
Step 5: Define a Type II Error
• For the univariate linear regression test:
.)/(
||0ˆ 01
nzCV X
Step 6: Find β
• For a univariate linear regression test:
}.)/(
))/(
||0(
)ˆ(
))ˆ(ˆ({Pr
1
10
1
111
n
En
zE
X
X
Basic Insight
• Notice that all three problems have the same basic structure.
• That is, if you understand the solution of the one sample test, then you can derive the answer to the other problems.
Step 7: Phrase requirement on β
• For example, we seek to “choose n so that β=0.01.”
• That is, “choose n so that Pr1{Accept H0}=β=0.01.
Step 7: Phrase requirement on β
• For example, we seek to “choose n so that
.}/
)/
||0(
)ˆ(
))ˆ(ˆ({Pr
1
10
1
111
n
En
zE
X
X
Step 7: Phrase requirement on β
• Notice the parallel phrasing:
.|}|Pr{ zZ
Step 7: Phrase requirement on β
• That is, “choose n so that (note that E0=0):
.||/
)/
||(
1
10
0
z
n
En
zE
X
X
Step 7: Phrase requirement on β
• That is, choose n so that (after algebraic clearing out):
.||||)( 1001
XX
zznEE
Step 8: State the conclusion
• The result for a left sided test has to be worked through but is similar. You must remember to keep all entries positive. This is reasonable if both α and β are constrained to be less than or equal to 0.5. The restriction is not a hardship in practice.
Univariate Linear Regression
• Note that the σ0 factor is changed to σ0/σX.
• There is a similar adjustment for the alternative standard deviation.
Example Problem Group
• Two hundred values of an independent variable xi are chosen so that Σ(xi-xbar)2 is equal to 400,000. For each setting of xi, the random variable Yi=β0+β1xi+σZi is observed. Here β0 and β1 are fixed but unknown parameters, σ=400, and the Zi are independent standard normal random variables.
Example Problem Group
• The null hypothesis to be tested is H0: β1=0, α=0.01, and the alternative is H1: β1<0. The random variable B1 is the OLS estimate of β1.
Example Question 1
• When H0 is true, what is the standard deviation of B1, the OLS estimate of the slope?
• Var(B1)=σ2/Σ(xi-xbar)2=4002/400,000=0.4.
• sd(B1)=0.632.
Example Question 2
• What is the probability of a Type II error in the test specified in the common section using B1, the OLS estimator of the slope, as test statistic when β1=-4, α=0.01, σ=400, and Σ(xi-xbar)2 is equal to 400,000?
Solution to Question 2
• The critical value is 0-2.326(0.632)=-1.47
• A Type II error occurs when B1>-1.47.
• Under alternative B1 is normal with expected value -4 and standard deviation (error) 0.632.
• Pr{B1>-1.47}=Pr{Z>(-1.47-(-4))/0.632} =Pr{Z>4.00}=.000032
• The answer is 0.000032.
Example Question 3
• How many observations n are necessary so that the probability of a Type II error in the test specified in the common section when β1=-4, α=0.01, σ=400, and Σ(xi-xbarn)2 is equal to 2,000n?
Outline of Solution to Problem 3
• For σo term, use (4002/2000)0.5=8.94.
• Use same value for σ1 term.
• Use |z0.01|=2.326.
• Use |E1-E0|=|-4-0|=4.
• Square root of sample size is 10.39.
• Sample size is 109 or more.
Chapter 21: Residual Analysis
• If the assumptions in regression are violated:
– Residuals are one way of checking model:
Ri = Yi - Fitted value at xi
Checking the Assumptions
– Check for normality (test of normality, histogram, q-q plots)
– Check variance if it is the same for all values of the independent variable (plot residuals against predicted values)
– Check independence (plot residuals against sequence variable)
– Check for linearity (plot dependent variable against independent variable)
Residual Plots
• Plot residuals against independent variable.– Plot should be flat indicating the same variance.– There should be no fanning out pattern.– Check for influential observations.
• Plot residuals against predicted variable.– For univariate regression this is the same as the
above plot. There should be no pattern.
What to do if problem?
• Can look for transformations of either independent or dependent variable or both.
• Using computer this is easy: compute option from menu bar.
Influential Points
• An easier way to look for points that have a large impact on the slope is to plot the change in slope against an arbitrary case sequence number.
Example
• Data set in the web page
• aim: predict final exam score from midterm score
• dependent variable: final exam score
• independent variable: midterm score
• model, check assumptions, predict
score on first exam
3002001000
final
exa
min
atio
n sc
ore
700
600
500
400
300
200
Output
• Model: Y= 0 + 1 X +
• R2 = 0.508
• F statistics=60.91, Significance=0.01=1.391117, t statistic=7.805,
Significance=0.00=238.95, t statistic=8.329,
Significance=0.0
Predicted Value
600500400300
Res
idua
l200
100
0
-100
-200
Residual
120.0100.0
80.060.0
40.020.0
0.0-20.0
-40.0-60.0
-80.0-100.0
-120.0-140.0
-160.0
14
12
10
8
6
4
2
0
Std. Dev = 66.68 Mean = 0.0N = 61.00
Normal Q-Q Plot of Residual
Observed Value
2001000-100-200
Exp
ecte
d N
orm
al V
alue
3
2
1
0
-1
-2
-3
Next Class
• Multiple Regression!
• Check web site for your data file