regression models

49
Part 2: Model and Inference -1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics

Upload: kagami

Post on 06-Feb-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Regression Models. Professor William Greene Stern School of Business IOMS Department Department of Economics. Regression and Forecasting Models. Part 2 – Inference About the Regression. The Linear Regression Model. 1. The linear regression model - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Regression Models

Part 2: Model and Inference2-1/49

Regression Models

Professor William Greene

Stern School of Business

IOMS Department

Department of Economics

Page 2: Regression Models

Part 2: Model and Inference2-2/49

Regression and Forecasting Models

Part 2 – Inference About the Regression

Page 3: Regression Models

Part 2: Model and Inference2-3/49

The Linear Regression Model

1. The linear regression model

2. Sample statistics and population quantities

3. Testing the hypothesis of no relationship

Page 4: Regression Models

Part 2: Model and Inference2-4/49

A Linear Regression

Predictor: Box Office = -14.36 + 72.72 Buzz

Page 5: Regression Models

Part 2: Model and Inference2-5/49

Data and Relationship We suggested the relationship between box office

and internet buzz is Box Office = -14.36 + 72.72 Buzz

Note the obvious inconsistency in the figure. This is not the relationship. The observed points do not lie on a line.

How do we reconcile the equation with the data?

Page 6: Regression Models

Part 2: Model and Inference2-6/49

Modeling the Underlying Process

A model that explains the process that produces the data that we observe: Observed outcome = the sum of two parts (1) Explained: The regression line (2) Unexplained (noise): The remainder

Regression model The “model” is the statement that part (1) is

the same process from one observation to the next. Part (2) is the randomness that is part of real world observation.

Page 7: Regression Models

Part 2: Model and Inference2-7/49

The Population Regression

THE model: A specific statement about the parts of the model (1) Explained:

Explained Box Office = β0 + β1 Buzz (2) Unexplained: The rest is “noise, ε.”

Random ε has certain characteristics Model statement

Box Office = β0 + β1 Buzz + ε

Page 8: Regression Models

Part 2: Model and Inference2-8/49

The Data Include the Noise

Page 9: Regression Models

Part 2: Model and Inference2-9/49

The Data Include the Noise

0+ 1Buzz

Box = 41, 0+ 1Buzz = 10, = 31

Page 10: Regression Models

Part 2: Model and Inference2-10/49

Model Assumptions

yi = β0 + β1xi + εi β0 + β1xi is the ‘regression function’

Contains the ‘information’ about yi in xi

Unobserved because β0 and β1 are not known for certain

εi is the ‘disturbance.’ It is the unobserved random component

Observed yi is the sum of the two unobserved parts.

Page 11: Regression Models

Part 2: Model and Inference2-11/49

Regression Model Assumptions About εi

Random Variable (1) The regression is the mean of yi for a particular xi.

εi is the deviation of yi from the regression line.

(2) εi has mean zero.

(3) εi has variance σ2.

‘Random’ Noise (4) εi is unrelated to any values of xi (no covariance) – it’s

“random noise” (5) εi is unrelated to any other observations on εj (not

“autocorrelated”) (6) Normal distribution - εi is the sum of many small influences

Page 12: Regression Models

Part 2: Model and Inference2-12/49

Regression Model

ROOMS

FUEL

BIL

L

111098765432

1400

1200

1000

800

600

400

200

Scatterplot of FUELBILL vs ROOMS

Page 13: Regression Models

Part 2: Model and Inference2-13/49

Conditional Normal Distribution of

ROOMS

FUEL

BIL

L

111098765432

1400

1200

1000

800

600

400

200

Scatterplot of FUELBILL vs ROOMS

Page 14: Regression Models

Part 2: Model and Inference2-14/49

A Violation of Point (4)c = 0 + 1 q + ?

Electricity Cost Data

Page 15: Regression Models

Part 2: Model and Inference2-15/49

A Violation of Point (5) - Autocorrelation

Time Trend of U.S. Gasoline Consumption

Page 16: Regression Models

Part 2: Model and Inference2-16/49

No Obvious Violations of Assumptions

Auction Prices for Monet Paintings vs. Area

Page 17: Regression Models

Part 2: Model and Inference2-17/49

Samples and Populations Population (Theory)

yi = β0 + β1xi + εi

Parameters β0, β1 Regression

β0 + β1xi

Mean of yi | xi

Disturbance, εi Expected value = 0

Standard deviation σ No correlation with xi

Sample (Observed) yi = b0 + b1xi + ei

Estimates, b0, b1

Fitted regression b0 + b1xi

Predicted yi|xi

Residuals, ei Sample mean 0,

Sample std. dev. se

Sample Cov[x,e] = 0

Page 18: Regression Models

Part 2: Model and Inference2-18/49

Disturbances vs. Residuals

0 1True : β + β Buzz

0 1Sample : b + b Buzz

=y- 0 - 1Buzze=y-b0 –b1Buzz

Page 19: Regression Models

Part 2: Model and Inference2-19/49

Standard Deviation of Residuals Standard deviation of εi = yi- β0 – β1xi is σ

σ = √E[εi2] (Mean of εi is zero)

Sample b0 and b1 estimate β0 and β1

Residual ei = yi – b0 – b1xi estimates εi

Use √(1/N)Σei2 to estimate σ? Close, not quite.

N

0 1 N2 2i i ii=1 i=1

e

e (y -b -b x )s = =

N- 2 N- 2

Why N-2? Relates to the fact that two parameters (β0,β1) were estimated. Same reason N-1 was used to compute a sample variance.

Page 20: Regression Models

Part 2: Model and Inference2-20/49

Page 21: Regression Models

Part 2: Model and Inference2-21/49

Linear Regression

Sample Regression Line

Page 22: Regression Models

Part 2: Model and Inference2-22/49

Residuals

Page 23: Regression Models

Part 2: Model and Inference2-23/49

Regression Computations

N

ii 1

N

ii 1

N2 2x ii 1

N2 2y ii 1

xy

i i

N = 62 complete observations.

1y = y = 20.721

N1

x = x = 0.48242N

1Var(x) = s = (x x) = 0.02453

N-11

Var(y) = s = (y y) = 305.985N-1

Cov(x,y) = s

1 = (x x)(y

N-1 N

i 1y) = 1.784

1

0

62 2

i 1

xy

2x

e

sb = = 72.72

s

b = y - bx = -14.36

s = = 13.386N- 2i 0 1 iy -b -b x

Page 24: Regression Models

Part 2: Model and Inference2-24/49

Page 25: Regression Models

Part 2: Model and Inference2-25/49

Page 26: Regression Models

Part 2: Model and Inference2-26/49

Results to Report

Page 27: Regression Models

Part 2: Model and Inference2-27/49

The Reported Results

Page 28: Regression Models

Part 2: Model and Inference2-28/49

Estimated equation

Page 29: Regression Models

Part 2: Model and Inference2-29/49

Estimated coefficients b0 and b1

Page 30: Regression Models

Part 2: Model and Inference2-30/49

Sum of squared residuals, Σiei

2

Page 31: Regression Models

Part 2: Model and Inference2-31/49

S = se = estimated std. deviation of ε

Page 32: Regression Models

Part 2: Model and Inference2-32/49

Interpreting (Estimated by se)Remember the empirical rule, 95% of observations will lie within mean ± 2 standard deviations? We show (b0 +b1x) ± 2se below.)

This point is 2.2 standard deviations from the regression.

Only 3.2% of the 62 observations lie outside the bounds. (We will refine this later.)

Page 33: Regression Models

Part 2: Model and Inference2-33/49

No Relationship: 1 = 0 Relationship: 1 0

How to Distinguish These Cases Statistically?

yi = β0 + β1xi + εi

Page 34: Regression Models

Part 2: Model and Inference2-34/49

Assumptions

(Regression) The equation linking “Box Office” and “Buzz” is stable

E[Box Office | Buzz] = α + β Buzz

Another sample of movies, say 2012, would obey the same fundamental relationship.

Page 35: Regression Models

Part 2: Model and Inference2-35/49

Sampling Variability

Samples 0 and 1 are a random split of the 62 observations.

Sample 1: Box Office = -13.25 + 68.51 Buzz

Sample 0: Box Office = -16.09 + 79.11 Buzz

Page 36: Regression Models

Part 2: Model and Inference2-36/49

Sampling Distributions

N 22ii=1

x

x

Sampling Distribution of the Mean

Estimator: x

(x -x)s 1Standard Error: s

N N N 1

Confidence Interval: x t* s

where t* is the appropriate value from the

t table (N-1 degrees of freedom).

1

1

1

N 22 i 0 1 i1

eb N N2 2

i ii=1 i=1

b

Sampling Distribution of a Regression Coefficient

Estimator: b

1(y -b -b x )s N-2Standard Error: s =

(x -x) (x -x)

Confidence Interval: b1 t* s

where t* is the appropriate

i

value from the

t table (N-2 degrees of freedom).

Page 37: Regression Models

Part 2: Model and Inference2-37/49

n = N-2

Small sample

Large sample

Page 38: Regression Models

Part 2: Model and Inference2-38/49

Standard Error of Regression Slope Estimator

Page 39: Regression Models

Part 2: Model and Inference2-39/49

Internet Buzz Regression

Regression Analysis: BoxOffice versus Buzz

The regression equation isBoxOffice = - 14.4 + 72.7 BuzzPredictor Coef SE Coef T PConstant -14.360 5.546 -2.59 0.012Buzz 72.72 10.94 6.65 0.000

S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4%

Analysis of VarianceSource DF SS MS F PRegression 1 7913.6 7913.6 44.16 0.000Residual Error 60 10751.5 179.2Total 61 18665.1

Range of Uncertainty for b is 72.72+1.96(10.94)to72.72-1.96(10.94)= [51.27 to 94.17]

If you use 2.00 from the t table, the limits would be [50.1 to 94.6]

Page 40: Regression Models

Part 2: Model and Inference2-40/49

Some computer programs report confidence intervals automatically; Minitab does not.

Page 41: Regression Models

Part 2: Model and Inference2-41/49

Uncertainty About the Regression Slope

Hypothetical Regression Fuel Bill vs. Number of Rooms The regression equation isFuel Bill = -252 + 136 Number of Rooms

Predictor Coef SE Coef T PConstant -251.9 44.88 -5.20 0.000Rooms 136.2 7.09 19.9 0.000

S = 144.456R-Sq = 72.2% R-Sq(adj) = 72.0%

This is b1, the estimate of β1

This “Standard Error,” (SE) is the measure of uncertainty about the true value.

The “range of uncertainty” is b ± 2 SE(b). (Actually 1.96, but people use 2)

Page 42: Regression Models

Part 2: Model and Inference2-42/49

Sampling Distributions and Test Statistics

N 22ii=1

x

x

For Testing a Hypothesis about a Mean

Hypothesis: H0: μ=0, H1:μ 0

Estimator: x

(x -x)s 1Standard Error: s =

N N N 1

x 0Test Statistic: t = ; t statistic N-1 D.F.

s

Rejection Region: |t| > Critical Value from Table

1

0 1 1 1

1

N 22 i 0 1 i1

eb N N2 2

i ii=1 i=1

1

For Testing a Hypothesis about a Regression Coefficient

Hypothesis: H : = 0, H : 0

Estimator: b

1(y -b -b x )s N-2Standard Error: s =

(x -x) (x -x)

bTest Statistic: t =

i

1b

0; t statistic N-2 D.F.

s

Rejection Region: |t| > Critical Value from Table

Page 43: Regression Models

Part 2: Model and Inference2-43/49

t Statistic for Hypothesis Test

Page 44: Regression Models

Part 2: Model and Inference2-44/49

Alternative Approach: The P value

Hypothesis: 1 = 0 The ‘P value’ is the probability that you would have

observed the evidence on this hypothesis that you did observe if the null hypothesis were true.

P = Prob(|t| would be this large | 1 = 0) If the P value is less than the Type I error probability

(usually 0.05) you have chosen, you will reject the hypothesis.

Interpret: It the hypothesis were true, it is ‘unlikely’ that I would have observed this evidence.

Page 45: Regression Models

Part 2: Model and Inference2-45/49

P value for hypothesis test

Page 46: Regression Models

Part 2: Model and Inference2-46/49

Intuitive approach: Does the confidence interval contain zero?

Hypothesis: 1 = 0

The confidence interval contains the set of plausible values of 1 based on the data and the test.

If the confidence interval does not contain 0, reject H0: 1 = 0.

Page 47: Regression Models

Part 2: Model and Inference2-47/49

More General Test

1

0 1 1 1

1

N 22 i 0 1 i1

eb N N2 2

i ii=1 i=1

1

For Testing a Hypothesis about a Regression Coefficient

Hypothesis: H : = B, H : B

Estimator: b

1(y -b -b x )s N-2Standard Error: s =

(x -x) (x -x)

bTest Statistic: t =

i

1b

B; t statistic N-2 D.F.

s

Rejection Region: |t| > Critical Value from Table

Page 48: Regression Models

Part 2: Model and Inference2-48/49

0 1 1 1

1

1

0

H :β =100; H :β 100

b -100Test statistic: t =

SE(b )

72.72 100 =

10.94 = -2.49

Critical t = -2.00. H is rejected.

Page 49: Regression Models

Part 2: Model and Inference2-49/49

Summary: Regression Analysis Investigate: Is the coefficient in a regression model really nonzero? Testing procedure:

Model: y = β0 + β1x + ε Hypothesis: H0: β1 = B. Rejection region: Least squares coefficient is far from zero.

Test: α level for the test = 0.05 as usual Compute t = (b1 – B)/StandardError Reject H0 if t is above the critical value

1.96 if large sample Value from t table if small sample.

Reject H0 if reported P value is less than α level

Degrees of Freedom for the t statistic is N-2