lecture 3 - statistical aspects of regression

1

Statistical Aspects of Regression and are only estimates of and Key question: How accurate are these estimates? Statistical procedures allow us to formally address this question.

2

What Factors Affect Accuracy of OLS Estimates? Graphical Intuition: Figure 5.1 (small number of data points) Figure 5.2 (large number of data points but very scattered) Figure 5.3 (large number of data points but not very scattered) Figure 5.4 (large number of data points, but clustered near one value for X)

3

What Factors Affect Accuracy of OLS Estimates? Consider fitting a line through the XY-plots in Figures 5.1-5.4. You would be most confident in the line you fit in Figure 5.3 Larger number of data points + less scattering (i.e. less variability in errors) + more variability in X = more accurate estimates. Note: Figures 5.1, 5.2, 5.3 and 5.4 all contain artificially generated data with =0, =1.

4

Figure 5.1: Very Small Sample Size

-2

-1

0

1

2

3

4

5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

X

Y

5

Figure 5.2: Large Sample Size, Large Error Variance

-4

-2

0

2

4

6

8

0 1 2 3 4 5 6

X

Y

6

Figure 5.3: Large Sample Size, Small Error Variance

-1

0

1

2

3

4

5

6

0 1 2 3 4 5 6

X

Y

7

Figure 5.4: Limited Range of X Values

-4

-2

0

2

4

6

8

10

0 0.5 1 1.5 2 2.5 3 3.5

X

Y

8

A Confidence Interval for

Uncertainty about accuracy of the estimate can be summarised in a

confidence interval 95% confidence interval for is given by:

],[bsbt

bsbt +

tb is a critical value from the Student t-distribution --- calculated

automatically in computer packages

9

sb = standard error of is a measure of the accuracy of

=

2)()2(bs

XiXN

SSR

10

A Confidence Interval for (cont.) tb controls the confidence level (e.g. tb is bigger for 95% confidence than 90%). sb varies directly with SSR (i.e. how variable the residuals are) sb varies inversely with N, the number of data points sb varies inversely with 2)Xi(X , which is related to the variance/variability of

X. Note: Different computer packages label confidence intervals in different ways. E.g. Excel labels bounds of confidence interval as Lower 95% and Upper 95%

11

Intuition of Confidence Interval Useful (but formally incorrect) intuition: There is a 95% probability that the

true value of lies in the confidence interval. Correct intuition: If you repeatedly use the above formula for calculating a

confidence interval, 95% of the intervals you construct will contain the true value for .

Can choose any level of confidence you want (e.g. 90%, 99%).

12

Example: Confidence Intervals for the Data sets in Figures 5.1-5.4

Data Set

90% Confid. Interval



Figure 5.1 .91 [-.92,2.75] [-1.57,3.39] [-3.64,5.47] Figure 5.2 1.04 [.75,1.32] [.70,1.38] [.59,1.49] Figure 5.3 1.00 [.99,1.01] [.99,1.02] [.98,1.03] Figure 5.4 1.52 [-1.33,4.36] [-1.88,4.91] [-2.98,6.02]

13

Example: The Regression of Executive Compensation on Profits Y = executive compensation X = profits

= .000842 95% Confidence interval: [.00061,.001075]

14

Example: The Regression of Lot Size on House Price OLS results:

,59.6136,34 XY +=

The OLS estimate of the marginal effect of X on Y is 6.59.

Increasing lot size by an extra square foot is associated with a $6.59 increase in house price.

The 95% confidence interval for is [5.72,7.47].

15

We are 95% confident that the effect of lot size on house is at least $5.72 and at most $7.47.

16

Hypothesis Testing Test whether =0 (i.e. whether X has any explanatory power) One way of doing it: look at confidence interval, check whether it contains zero.

If no, then you are confident 0.

17

Alternative (equivalent) way is to use t-statistic (often called t-ratio)

bs

t

=

Big values for t indicate 0. Small values for t indicate =0.

18

Hypothesis Testing (cont.)

Q: What do we mean by big and small? A: Look at P-value. If P-value .05 then t is big and conclude 0. If P-value >.05 then t is small and conclude =0.

19

Useful (but formally incorrect) intuition:

P-value measures the probability that = 0.

.05 = 5% = level of significance Other levels of significance (e.g. 1% or 10%) occasionally used

20

Example: The Regression of Executive Compensation on Profits (cont.)

95% Confidence interval: [.00061,.001075] Confidence interval does not contain zero, so conclude that 0.

21

Alternatively: t-ratio is 7.227937. Is this big? Yes, the P-value is 5.510-10 which is much less than .05. Hence, we conclude again that 0.

22

Jargon

The coefficient on profits is significantly different from zero. Profits have statistically significant explanatory power for executive

compensation. The hypothesis that = 0 can be rejected at the 5% significance level.

23

Testing on R2: The F-statistic

Test whether R2=0 (i.e. whether X has any explanatory power) Note: In simple regression testing R2=0 and =0 are the same, but in multiple

regression they will be different. F-statistic is a test statistic analogous to t-statistic (e.g. small values of it

indicate R2=0).

)21(

2)2(

R

RNF

=

24

Testing on R2: The F-statistic (cont.) For test with 5% level of significance: If P-value is > .05 conclude R2=0. If P-value is .05 conclude R20. Excel calls the P-value for this test Significance F

25

Example: The Regression of Executive Compensation on Profits (cont.)

P-value = Significance F = 5.510-10. Since P-value < .05 conclude R20. Profits do have explanatory power for Y.

26

Chapter Summary 1. Accuracy of OLS estimates depends on number of data points, variability of the

explanatory variable and variability of the errors. 2. The confidence interval provides an interval in which you can be confident lies. 3. The width of the confidence interval depends on the same factors as affect the accuracy

of OLS estimates. In addition, the width of the confidence interval depends on the confidence level.

4. A hypothesis test of whether =0 can be used to find out whether the explanatory

variable belongs in the regression. The P-value is a measure of how plausible the hypothesis is.

5. If the P-value for the hypothesis test of whether =0 is less than .05 then you can reject

the hypothesis at the 5% level of significance.

27

6. If the P-value for the hypothesis test of whether =0 is greater than .05 then you cannot reject the hypothesis at the 5% level of significance.

7. A hypothesis test of whether R2=0 can be used to investigate whether the regression

helps explain the dependent variable.

lecture 3 - statistical aspects of regression

Documents

interval figure

confidence level

level of confidence

scattered figure

large number of data

number of data points

larger number of data

data sets