2. simple linear regression b mr. sydney armstrong ... · ecn 3202 applied econometrics 1 semester...

ECN 3202

APPLIED ECONOMETRICS

1

Semester 2, 2015/2016

2. Simple linear regression B

Mr. Sydney Armstrong

Lecturer 1

The University of Guyana

PREDICTION

The true value of 𝑦 when 𝑥 takes some particular value 𝑥0 is found as

The OLS-estimated value (or predictor) of 𝑦0 is

(note analogy)

Can we trust it? Is it a good predictor?

How big is the prediction error?

Can we construct a confidence interval on the predicted value?

Let us define the prediction error (or forecast error) as

Note: because 𝑦 is a r.v. the prediction error is also a random variable, with some mean (expectation) and some variance.

Let’s see what they are …

2

PREDICTION cont.

If assumptions SR1 to SR5 hold then

ie.,

Similarly,

3

Note:

the smaller the variance of the original error (noise) the better

(i.e., less noisy) would be the prediction, ceteris paribus.

the larger the sample size 𝑁 the better (i.e., less noisy) would be

the prediction, ceteris paribus.

the larger the variation in x (i.e., the larger ) the

better (i.e., the less noisy) would be the prediction, ceteris paribus.

Where will be the smallest variance of the prediction?

4

PREDICTION cont.

With a help of simple algebra, we can also derive that

Note: 𝜎² is unknown, but as before, we can replace 𝜎2 by its estimate 𝜎² (as we did in Lecture 2) and so we get ݎ(𝑓).

The square root of the estimated forecast error variance is called the standard error of the forecast, denoted as se(𝑓 )=

If SR6 (normality) is correct (or 𝑁 is large), then a 100(1 – α)% confidence interval, or prediction interval for 𝑦0 is:

where 𝑡𝑐=𝑡(1−𝛼/2,𝑁−2) is the value that leaves an area of 𝛼/2 in the right-tail of a t-distribution with 𝑁 – 2 degrees of freedom.

5

PREDICTION cont.

If we construct CIs for 𝑦 at all 𝑥 and plot it (dashed curve) together with the fitted regression line, then we will see figure like this:

The smallest variance of the prediction is obtained when 𝑥= 𝑥and the further away it is from 𝑥 the larger would be the variance of prediction

Reduction of the variance (or se) of the predicted error reduces the width of the confidence interval

6

Example.

Using 𝑁 = 40 observations on food expenditure and

income:

The least squares estimate (or prediction) of y given

x = 20 is

The standard error of the forecast is

So, the 95%-prediction interval for 𝑦0, given 𝑥 = 20 is:

287.61 ± (2.024)(90.63) or 287.61 ± 183.43 7

Intuitively: We are 95% confident that weekly food

expenditure of the person with 𝑥=20 is between $104.18

and $471.04.

8

9

Example cont.

The estimated confidence intervals might be somewhat

disappointing—they are too wide! Well, statistics is a very

powerful tool, … but it is not a ‘Chrystal ball’!

Remember, confidence intervals (CI) in general depend on:

•variance of the original error (noise)

•sample size (𝑁)

•variation in explanatory variable (𝑥)

… and, in addition, for prediction CI,

the smallest variance of the prediction is

obtained when 𝑥0=𝑥 …

What about the data we used?! 𝑁 is small and large variation in 𝑦 …

May be other important explanatory variables are missing?

What we need is a measure of how good a regression fits the data!

•This measure would then indicate (before we estimate prediction

intervals!) how reliable would be predictions based on such regression

…

10

GOODNESS OF FIT

Let then and so

Now square both sides of last equation and sum it over

all 𝑖 to get:

11

SST = total sum of squares (a measure of total variation in the

dependent variable about its sample mean)

SSR = regression sum of squares (the part that is explained by the

regression)

SSE = sum of squared errors (the part of the total variation that is

unexplained at all)

12

GOODNESS OF FIT cont.

Now, let’s compute the degree of how much of total variation in the

dependent variable 𝑦 (i.e., in 𝑆𝑆𝑇) is explained by our estimated

regression, i.e., by 𝑆𝑆𝑅. For this, we can use:

this 𝑅² is called coefficient of determination.

If 𝑅²=1 the data fall exactly on the fitted OLS regression line, in

which case we call it a perfect fit.

If the sample data for 𝑦 and 𝑥 are uncorrelated and show no linear

association, then the fitted OLS line is “horizontal,” so 𝑆𝑆𝑅 = 0 and so

𝑅² = 0.

Also note: For a simple regression model, 𝑅² can also be computed as

the square of the correlation coefficient between 𝑦𝑖 and 𝑦𝑖. 14

Example

Using 𝑁 = 40 observations on income and food

expenditure:

15

Example cont.: Output from EViews

A common format for reporting regression results: 16

17

Example cont.

We conclude that 38.5% of the variation in food

expenditure (about its sample mean) is explained by the

variation in x.

THE EFFECTS OF SCALING THE DATA

Changing the scale of 𝑥 into 𝑥/𝑐:

18

Example – food expenditure cont.

Measuring food expenditure in dollars and income in $100:

Food expenditure and income in dollars (i.e., x* = 100x):

Food expenditure in $100 and income in $100 (i.e., = y/100):

… but t-statistics and 𝑅2 are unaffected!

19

CHOOSING A FUNCTIONAL FORM

Different functional forms imply different relationships between 𝑦and 𝑥 …

… and certainly different estimated coefficients!

So, one must choose the functional form carefully! …

20

Linear Functional Form

Model:

Slope (‘Marginal effect’):

Meaning of slope: A one-unit increase in 𝑥 leads to 𝛽2-units change

in 𝑦 (in whatever 𝑥 and 𝑦 are measured in).

Measure of Elasticity:

Meaning of Elasticity: The elasticity measures the percent change in

𝑦 with respect to a one-percent change in 𝑥,

•it may vary across 𝑥 (in spite of linear relationship b/w 𝑥 and 𝑦!).

•might be a more convenient measure of the impact of 𝑥 on 𝑦 than

slope

21

Log-Log Functional Form

Suppose the true model:

Let’s transform it into:

where 𝑦*=ln (𝑦) and 𝑥*=ln𝑥, i.e., natural logarithms of 𝑥 and 𝑦.

So, the slope-coefficient in the transformed model is 𝛽2, i.e.,

Also note that:

i.e.,

22

i.e., slope-coefficient, 𝛽2, in the log-log model is the

elasticity of y vs. x! Also, note that for the log-log model:

the elasticity of y with respect to x is constant! (= 𝛽2)

To use this model we must have 𝑦 > 0 and 𝑥 > 0

23

Reciprocal Functional Form

Model:

Slope:

Elasticity:

24

Common Functional Forms

25

Examples

26

Examples cont.

27

Examples cont.

28

A Practical Approach

We should choose a functional form that …

is consistent with what economic theory tells us about the

relationship between x and y;

is compatible with assumptions SR2 to SR6; and

is flexible enough to fit the data.

In practice, this involves …

plotting the data and choosing economically-plausible models;

testing hypotheses concerning the parameters;

performing residual analysis;

assessing forecasting performance

measuring goodness-of fit; and

using the principle of parsimony. 29

30

Example – food expenditure cont.: which model to use?

Linear:

Linear-log:

Log-linear:

All slope coefficients are significantly different from zero at the

1% level of significance …

So, which of the models shall we trust more?! …

Can we just compare 𝑅² and choose the highest one?

No! Not so simple

Remarks on Goodness-of-Fit

In linear model: 𝑅² measures how well the linear model explains the

variation in 𝑦,

In log-linear model: 𝑅² measures how well that log-linear model

explains the variation in ln(𝑦)

So, the two measures should not be compared!

To compare goodness-of-fit in models with different dependent

variables, we can compute the generalised R²:

where 𝑦 is the fitted value of 𝑦 from the estimated regression on the

particular model of interest and corr (𝑦, 𝑦) is the sample correlation

coefficient between 𝒚 and 𝒚 , estimated as

31

Example – food expenditure cont.

Linear model:

Log-linear model:

Note: For the log-linear model, we can compute the generalised 𝑅² using either the ‘natural’ or ‘corrected’ predictions—because they differ only by a constant 𝑅²𝑔would be the same, since correlation is not affected by a constant.

Conclusion: In our example, both models have

very similar (and not very high!) 𝑅² and so

can be deemed as they fit the data similarly well,

with linear fitting slightly better and so might be preferred for the sake of simplicity (parsimony principle!)

32

TESTING FOR NORMALLY DISTRIBUTED

ERRORS

The k-th central moment of the random variable 𝑒 is

where 𝜇 denotes the mean (and the first moment!) of e.

Measures of spread (dispersion), symmetry and

“peakedness” are:

True Variance: 𝜇2=𝜎²

True Skewness: 𝑆=𝜇3/𝜎³

True Kurtosis: 𝐾=𝜇4/𝜎4

If 𝑒 is normally distributed then 𝑆 = 0 and 𝐾 = 3. 33

The Jarque-Bera Test

There are many tests for normality of the errors (or residuals).

The idea of the most popular test for normality, called the Jarque-Bera test, is based on testing how far the measures of residual skewness and kurtosis are from 0 and 3, respectively. In particular:

𝑯𝟎:𝐸ݎݎ𝑜ݎ𝑠 𝑒ݎ 𝑁𝑜ݎ𝑚𝑙; 𝑯𝟏:𝐸ݎݎ𝑜ݎ𝑠 𝑒ݎ 𝑛𝑜𝑛−𝑛𝑜ݎ𝑚𝑙

Test Statistic:

Decision: Reject if value of 𝐽𝐵 is beyond the critical value from 𝜒(2)2 for chosen 𝛼 or, simply, if p-vale is less than or equal to 𝛼.

Conclusion: If 𝐻0 is rejected ⇒ unlikely that errors are normal.

If 𝐻0 is not rejected ⇒ can’t say we accept 𝐻0 as the truth …

… but have more confidence in assumption that errors are normal…

34

Example – food expend. cont.

Hypotheses: 𝐻0: errors are normal 𝐻1: not 𝐻0.

Test statistic:

Decision rule: Reject Ho if

Decision: Do not reject Ho

Conclusion: We cannot reject hypothesis of normally distributed errors with our data and assumption.

Although, based on inability to reject normality of errors, we can’t claim errors are normal (i.e., can’t accept 𝐻0), we get more confidence by making the assumption of normality, if we need it …

35

Prediction and Functional Forms

When doing predictions, one must be remember the units of

measurement and the functional form used.

For example, in the case of the log-linear regression model,

the fitted regression line predicts

… but we need to predict 𝑦! How can get one from the other?

A “natural” predictor of 𝑦 is:

However, if assumption SR6 holds, then a better predictor is

37

Example

The estimated log-linear model:

The natural prediction of y given, for example, 𝑥 = 20 is

A better prediction:

38

Prediction Intervals for Log-Linear Models

For purpose of Prediction Intervals in a log-linear model, it’s easier to

use the natural predictor (because the corrected predictor includes

the estimated error variance, making t-distribution no applicable

anymore!)

get prediction interval in usual manner

then take antilog of this interval …

Specifically, if SR6 (normality) is correct (or 𝑁 is large), then a

100(1 – α)% prediction interval for 𝑦0 is:

where 𝑡𝑐=𝑡(1−𝛼/2,𝑁−2) is the value that leaves an area of 𝛼/2 in the

right-tail of a t-distribution with 𝑁 – 2 degrees of freedom. 39

2. simple linear regression b mr. sydney armstrong ... · ecn 3202 applied econometrics 1 semester...

Documents