2. simple linear regression b mr. sydney armstrong ... · ecn 3202 applied econometrics 1 semester...

39
ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University of Guyana

Upload: others

Post on 22-Aug-2020

6 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

ECN 3202

APPLIED ECONOMETRICS

1

Semester 2, 2015/2016

2. Simple linear regression B

Mr. Sydney Armstrong

Lecturer 1

The University of Guyana

Page 2: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

PREDICTION

The true value of 𝑦 when 𝑥 takes some particular value 𝑥0 is found as

The OLS-estimated value (or predictor) of 𝑦0 is

(note analogy)

Can we trust it? Is it a good predictor?

How big is the prediction error?

Can we construct a confidence interval on the predicted value?

Let us define the prediction error (or forecast error) as

Note: because 𝑦 is a r.v. the prediction error is also a random variable, with some mean (expectation) and some variance.

Let’s see what they are …

2

Page 3: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

PREDICTION cont.

If assumptions SR1 to SR5 hold then

ie.,

Similarly,

3

Page 4: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Note:

the smaller the variance of the original error (noise) the better

(i.e., less noisy) would be the prediction, ceteris paribus.

the larger the sample size 𝑁 the better (i.e., less noisy) would be

the prediction, ceteris paribus.

the larger the variation in x (i.e., the larger ) the

better (i.e., the less noisy) would be the prediction, ceteris paribus.

Where will be the smallest variance of the prediction?

4

Page 5: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

PREDICTION cont.

With a help of simple algebra, we can also derive that

Note: 𝜎² is unknown, but as before, we can replace 𝜎2 by its estimate 𝜎² (as we did in Lecture 2) and so we get ݎ(𝑓).

The square root of the estimated forecast error variance is called the standard error of the forecast, denoted as se(𝑓 )=

If SR6 (normality) is correct (or 𝑁 is large), then a 100(1 – α)% confidence interval, or prediction interval for 𝑦0 is:

where 𝑡𝑐=𝑡(1−𝛼/2,𝑁−2) is the value that leaves an area of 𝛼/2 in the right-tail of a t-distribution with 𝑁 – 2 degrees of freedom.

5

Page 6: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

PREDICTION cont.

If we construct CIs for 𝑦 at all 𝑥 and plot it (dashed curve) together with the fitted regression line, then we will see figure like this:

The smallest variance of the prediction is obtained when 𝑥= 𝑥and the further away it is from 𝑥 the larger would be the variance of prediction

Reduction of the variance (or se) of the predicted error reduces the width of the confidence interval

6

Page 7: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Example.

Using 𝑁 = 40 observations on food expenditure and

income:

The least squares estimate (or prediction) of y given

x = 20 is

The standard error of the forecast is

So, the 95%-prediction interval for 𝑦0, given 𝑥 = 20 is:

287.61 ± (2.024)(90.63) or 287.61 ± 183.43 7

Page 8: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Intuitively: We are 95% confident that weekly food

expenditure of the person with 𝑥=20 is between $104.18

and $471.04.

8

Page 9: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

9

Example cont.

The estimated confidence intervals might be somewhat

disappointing—they are too wide! Well, statistics is a very

powerful tool, … but it is not a ‘Chrystal ball’!

Remember, confidence intervals (CI) in general depend on:

•variance of the original error (noise)

•sample size (𝑁)

•variation in explanatory variable (𝑥)

… and, in addition, for prediction CI,

the smallest variance of the prediction is

obtained when 𝑥0=𝑥 …

Page 10: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

What about the data we used?! 𝑁 is small and large variation in 𝑦 …

May be other important explanatory variables are missing?

What we need is a measure of how good a regression fits the data!

•This measure would then indicate (before we estimate prediction

intervals!) how reliable would be predictions based on such regression

10

Page 11: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

GOODNESS OF FIT

Let then and so

Now square both sides of last equation and sum it over

all 𝑖 to get:

11

Page 12: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

SST = total sum of squares (a measure of total variation in the

dependent variable about its sample mean)

SSR = regression sum of squares (the part that is explained by the

regression)

SSE = sum of squared errors (the part of the total variation that is

unexplained at all)

12

Page 13: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

13

Page 14: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

GOODNESS OF FIT cont.

Now, let’s compute the degree of how much of total variation in the

dependent variable 𝑦 (i.e., in 𝑆𝑆𝑇) is explained by our estimated

regression, i.e., by 𝑆𝑆𝑅. For this, we can use:

this 𝑅² is called coefficient of determination.

If 𝑅²=1 the data fall exactly on the fitted OLS regression line, in

which case we call it a perfect fit.

If the sample data for 𝑦 and 𝑥 are uncorrelated and show no linear

association, then the fitted OLS line is “horizontal,” so 𝑆𝑆𝑅 = 0 and so

𝑅² = 0.

Also note: For a simple regression model, 𝑅² can also be computed as

the square of the correlation coefficient between 𝑦𝑖 and 𝑦𝑖. 14

Page 15: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Example

Using 𝑁 = 40 observations on income and food

expenditure:

15

Page 16: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Example cont.: Output from EViews

A common format for reporting regression results: 16

Page 17: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

17

Example cont.

We conclude that 38.5% of the variation in food

expenditure (about its sample mean) is explained by the

variation in x.

Page 18: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

THE EFFECTS OF SCALING THE DATA

Changing the scale of 𝑥 into 𝑥/𝑐:

18

Page 19: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Example – food expenditure cont.

Measuring food expenditure in dollars and income in $100:

Food expenditure and income in dollars (i.e., x* = 100x):

Food expenditure in $100 and income in $100 (i.e., = y/100):

… but t-statistics and 𝑅2 are unaffected!

19

Page 20: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

CHOOSING A FUNCTIONAL FORM

Different functional forms imply different relationships between 𝑦and 𝑥 …

… and certainly different estimated coefficients!

So, one must choose the functional form carefully! …

20

Page 21: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Linear Functional Form

Model:

Slope (‘Marginal effect’):

Meaning of slope: A one-unit increase in 𝑥 leads to 𝛽2-units change

in 𝑦 (in whatever 𝑥 and 𝑦 are measured in).

Measure of Elasticity:

Meaning of Elasticity: The elasticity measures the percent change in

𝑦 with respect to a one-percent change in 𝑥,

•it may vary across 𝑥 (in spite of linear relationship b/w 𝑥 and 𝑦!).

•might be a more convenient measure of the impact of 𝑥 on 𝑦 than

slope

21

Page 22: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Log-Log Functional Form

Suppose the true model:

Let’s transform it into:

where 𝑦*=ln (𝑦) and 𝑥*=ln𝑥, i.e., natural logarithms of 𝑥 and 𝑦.

So, the slope-coefficient in the transformed model is 𝛽2, i.e.,

Also note that:

i.e.,

22

Page 23: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

i.e., slope-coefficient, 𝛽2, in the log-log model is the

elasticity of y vs. x! Also, note that for the log-log model:

the elasticity of y with respect to x is constant! (= 𝛽2)

To use this model we must have 𝑦 > 0 and 𝑥 > 0

23

Page 24: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Reciprocal Functional Form

Model:

Slope:

Elasticity:

24

Page 25: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Common Functional Forms

25

Page 26: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Examples

26

Page 27: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Examples cont.

27

Page 28: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Examples cont.

28

Page 29: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

A Practical Approach

We should choose a functional form that …

is consistent with what economic theory tells us about the

relationship between x and y;

is compatible with assumptions SR2 to SR6; and

is flexible enough to fit the data.

In practice, this involves …

plotting the data and choosing economically-plausible models;

testing hypotheses concerning the parameters;

performing residual analysis;

assessing forecasting performance

measuring goodness-of fit; and

using the principle of parsimony. 29

Page 30: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

30

Example – food expenditure cont.: which model to use?

Linear:

Linear-log:

Log-linear:

All slope coefficients are significantly different from zero at the

1% level of significance …

So, which of the models shall we trust more?! …

Can we just compare 𝑅² and choose the highest one?

No! Not so simple

Page 31: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Remarks on Goodness-of-Fit

In linear model: 𝑅² measures how well the linear model explains the

variation in 𝑦,

In log-linear model: 𝑅² measures how well that log-linear model

explains the variation in ln(𝑦)

So, the two measures should not be compared!

To compare goodness-of-fit in models with different dependent

variables, we can compute the generalised R²:

where 𝑦 is the fitted value of 𝑦 from the estimated regression on the

particular model of interest and corr (𝑦, 𝑦) is the sample correlation

coefficient between 𝒚 and 𝒚 , estimated as

31

Page 32: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Example – food expenditure cont.

Linear model:

Log-linear model:

Note: For the log-linear model, we can compute the generalised 𝑅² using either the ‘natural’ or ‘corrected’ predictions—because they differ only by a constant 𝑅²𝑔would be the same, since correlation is not affected by a constant.

Conclusion: In our example, both models have

very similar (and not very high!) 𝑅² and so

can be deemed as they fit the data similarly well,

with linear fitting slightly better and so might be preferred for the sake of simplicity (parsimony principle!)

32

Page 33: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

TESTING FOR NORMALLY DISTRIBUTED

ERRORS

The k-th central moment of the random variable 𝑒 is

where 𝜇 denotes the mean (and the first moment!) of e.

Measures of spread (dispersion), symmetry and

“peakedness” are:

True Variance: 𝜇2=𝜎²

True Skewness: 𝑆=𝜇3/𝜎³

True Kurtosis: 𝐾=𝜇4/𝜎4

If 𝑒 is normally distributed then 𝑆 = 0 and 𝐾 = 3. 33

Page 34: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

The Jarque-Bera Test

There are many tests for normality of the errors (or residuals).

The idea of the most popular test for normality, called the Jarque-Bera test, is based on testing how far the measures of residual skewness and kurtosis are from 0 and 3, respectively. In particular:

𝑯𝟎:𝐸ݎݎ𝑜ݎ𝑠 𝑒ݎ 𝑁𝑜ݎ𝑚𝑙; 𝑯𝟏:𝐸ݎݎ𝑜ݎ𝑠 𝑒ݎ 𝑛𝑜𝑛−𝑛𝑜ݎ𝑚𝑙

Test Statistic:

Decision: Reject if value of 𝐽𝐵 is beyond the critical value from 𝜒(2)2 for chosen 𝛼 or, simply, if p-vale is less than or equal to 𝛼.

Conclusion: If 𝐻0 is rejected ⇒ unlikely that errors are normal.

If 𝐻0 is not rejected ⇒ can’t say we accept 𝐻0 as the truth …

… but have more confidence in assumption that errors are normal…

34

Page 35: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Example – food expend. cont.

Hypotheses: 𝐻0: errors are normal 𝐻1: not 𝐻0.

Test statistic:

Decision rule: Reject Ho if

Decision: Do not reject Ho

Conclusion: We cannot reject hypothesis of normally distributed errors with our data and assumption.

Although, based on inability to reject normality of errors, we can’t claim errors are normal (i.e., can’t accept 𝐻0), we get more confidence by making the assumption of normality, if we need it …

35

Page 36: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

36

Page 37: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Prediction and Functional Forms

When doing predictions, one must be remember the units of

measurement and the functional form used.

For example, in the case of the log-linear regression model,

the fitted regression line predicts

… but we need to predict 𝑦! How can get one from the other?

A “natural” predictor of 𝑦 is:

However, if assumption SR6 holds, then a better predictor is

37

Page 38: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Example

The estimated log-linear model:

The natural prediction of y given, for example, 𝑥 = 20 is

A better prediction:

38

Page 39: 2. Simple linear regression B Mr. Sydney Armstrong ... · ECN 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University

Prediction Intervals for Log-Linear Models

For purpose of Prediction Intervals in a log-linear model, it’s easier to

use the natural predictor (because the corrected predictor includes

the estimated error variance, making t-distribution no applicable

anymore!)

get prediction interval in usual manner

then take antilog of this interval …

Specifically, if SR6 (normality) is correct (or 𝑁 is large), then a

100(1 – α)% prediction interval for 𝑦0 is:

where 𝑡𝑐=𝑡(1−𝛼/2,𝑁−2) is the value that leaves an area of 𝛼/2 in the

right-tail of a t-distribution with 𝑁 – 2 degrees of freedom. 39