assumptions of simple and multiple linear regression model

25
Polytechnic University of the Philippines Sta. Mesa, Manila ECON 3023 ECONOMETRICS GROUP PROJECT Submitted By: GROUP 8 Camacho, Irwin Dave De Ramos, Liezel Gonzales, Divina Oliver, Ralph Laurence B.S. ECONOMICS 3-1 Submitted To: Prof. Alberto Guillo Vice President for Administration, PUP

Upload: divina-gonzales

Post on 24-Dec-2015

45 views

Category:

Documents


3 download

DESCRIPTION

Econometrics Project Group 8 BS Economics 3-1 Academic Year 2014-2015

TRANSCRIPT

Page 1: Assumptions of Simple and Multiple Linear Regression Model

Polytechnic University of the Philippines

Sta. Mesa, Manila

ECON 3023 ECONOMETRICS

GROUP PROJECT

Submitted By:

GROUP 8

Camacho, Irwin Dave

De Ramos, Liezel

Gonzales, Divina

Oliver, Ralph Laurence

B.S. ECONOMICS 3-1

Submitted To:

Prof. Alberto Guillo

Vice President for Administration, PUP

Page 2: Assumptions of Simple and Multiple Linear Regression Model

ASSUMPTIONS OF SIMPLE AND

MULTIPLE REGRESSION MODEL

Page 3: Assumptions of Simple and Multiple Linear Regression Model

ASSUMPTION 1: LINEAR REGRESSION MODEL

Departures from/ Violations of assumptions (both Two-Variable Linear Model and

Multiple Variable Linear Model)

Graphical Approach

Statistical Approach

Effect/s of the violation in

the model

Remedial Measures

Sample of violation with

remedial measures as discussed in a research output published in a professional research

journal (At least one for each

violation)

Nonlinearity in Parameters

Scatterplot

Plot of observed versus predicted values

Plot of residuals versus predicted values -Better than the observed-versus-predicted plot for this purpose, because it eliminates the visual distraction of a sloping pattern

Chow Test

The normal equations for nonlinear regression have the unknowns (the B’s) both on the left and right-hand sides of the equations.

As a consequence, we cannot obtain explicit solutions of the unknowns in terms of the known quantities.

Wrong regressors

Changing parameters

Perform a curvilinear transformation

If there appears to be

a quadratic pattern to

the residuals,

polynomial

transformation of

degree 2 should be

tried.

Log transformation

Locally Weighted

Scatterplot Smoothing

(LOWESS)

Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff

Authors & Institutional Affiliations: -Thomas Herndon -Michael Ash -Robert Pollin -Political Economy Research Institute, -University of Massachusetts Amherst

Publication/Journal Title: -Working Paper Series

Page 4: Assumptions of Simple and Multiple Linear Regression Model

Date of Publication,

Volume No. and Year: -April 2013, Number 322

Violation:

-Nonlinear relationship between the variables

How was it detected: -Scatterplot

Remedial Measure: -LOWESS

Page 5: Assumptions of Simple and Multiple Linear Regression Model

ASSUMPTION 2: X VALUES ARE FIXED IN REPEATED SAMPLING

Departures from/ Violations of assumptions (both Two-Variable Linear Model and

Multiple Variable Linear Model)

Graphical Approach

Statistical Approach

Effect/s of the violation in

the model

Remedial Measures

Sample of violation with

remedial measures as discussed in a research output published in a professional research

journal (At least one for each

violation)

Stochastic regressor(s)

Endogeneity

Graphically examining

the cumulative sum of

the recursive residuals

Durbin-Wu-Hausman Specification Test

Johansen Test

Engle-Granger two-step method

Simultaneous

equation or

simultaneity problem

The estimates of the

slope and intercept

will be biased

Use of Proxy variable

Instrumental Variable (IV) Regression

Clever Sample Selection.

Drop the polluted observations of X that covary with the disturbance

Instrumental Variables or Control Variables.

In each observation, drop the polluted component of X or control for the polluted component of X.

Full Information Methods.

Model the covariation of errors across the equations.

Foreign Direct Investment Inflows and Economic Growth in Ghana

Author: -Baba Insah

Publication/Journal Title: -International Journal of Economic Practices and Theories

Date of Publication, Volume No. and Year: -April 2013, Vol.3, No.2

Violation: -Stochastic regressors

Page 6: Assumptions of Simple and Multiple Linear Regression Model

How was it detected:

-Johansen Test

Remedial Measure: -Engle-Granger two-step methodology for error correction was employed -Error Correction was utilized.

Page 7: Assumptions of Simple and Multiple Linear Regression Model

ASSUMPTION 3: ZERO MEAN VALUE OF DISTURBANCE ui

Departures from/ Violations of assumptions (both Two-Variable Linear Model and

Multiple Variable Linear Model)

Graphical Approach

Statistical Approach

Effect/s of the violation in

the model

Remedial Measures

Sample of violation with

remedial measures as discussed in a research output published in a professional research

journal (At least one for each

violation)

Nonzero mean of ui

Plot of residuals

against the

predictor(s)

if there are more than

a couple of

predictors, at least

against fitted values

Cobb-Douglas Function

The coefficient

estimate gets biased

down

Randomization of the measurement order

Randomization can

effectively convert

systematic

measurement errors

into additional

random process error.

While adding to the

random error of the

process is undesirable,

this will provide the

best possible

information from the

data about the

regression function.

Using additional

information

Redesigning the

measurement system

Consumption of

Tobacco and

Alcoholic Beverages

Among Spanish

Consumers

Authors &

Institutional

Affiliations:

-Anna-Lena Beutel

- Stefan Minner

-Department of

Business

Administration,

University of Vienna,

Austria

Page 8: Assumptions of Simple and Multiple Linear Regression Model

to eliminate the

systematic errors

Reformulating the

problem to obtain the

needed information

the other way

Publication/Journal

Title:

-International Journal

of Production

Economics

Date of Publication &

Volume No:

2011

Violation:

-non zero mean of

error

How was it detected:

-Using plot of

disturbance

Remedial Measures

-Used service level

model which performs

better for the target

fill rate than for in-

stock probabilities

Page 9: Assumptions of Simple and Multiple Linear Regression Model

ASSUMPTION 4: HOMOSCEDASTICITY OR EQUAL VARIANCE OF 𝒖𝒊

Departures from/ Violations

of assumptions (both Two-

Variable Linear Model and

Multiple Variable Linear

Model)

Graphical Approach

Statistical Approach

Effect/s of the violation in

the model

Remedial Measures

Sample of violation with

remedial measures as

discussed in a research

output published in a

professional research

journal

(At least one for each

violation)

Heteroscedasticity

Scatterplot

look for any pattern:

No heteroscadasticity

Not Linear and nature is unknown

Linear increase and

presence of heteroscadasticity

Heteroscadasticity

with quadratic relationship

Quadratic relationship

Park Test

Breush-Pagan / Cook-Weisberg Test for Heteroscedasticity

White General Test

for Heteroscedasticity

Goldfeld-Quandt Test

Gives equal weight to

all observations

Standard errors are

biased

This in turn leads to

bias in test statistics

and confidence

intervals

Respectify the Model/

Transform the

Variable

Use Robust Standard

Error (also referred as

Huber/White

estimators or

sandwich estimators

of variance)

Use Weighted Least

Square

Meta-analysis of

alcohol price and

income elasticities –

with corrections for

publication bias

Authors &

Institutional

Affiliations:

-Jon P. Nelson

Publication/Journal

Title:

-Nelson Health

Economics Review

Date of Publication &

Volume No:

2013, 3:17

Page 10: Assumptions of Simple and Multiple Linear Regression Model

Violation:

-Heteroscedasticity

How was it detected:

-Scatterplot; a

“funnel-shaped” plot

was detected

Remedial Measures:

-Dividing equation (1)

by the standard error

to yield

where ti is the t-

statistic for the i-th

observation, 1/Sei is

its precision, and vi is

an error term

corrected for

heteroscedasticity.

Page 11: Assumptions of Simple and Multiple Linear Regression Model

ASSUMPTION 5: NO AUTOCORRELATION BETWEEN THE DISTURBANCES

Departures from/ Violations

of assumptions (both Two-

Variable Linear Model and

Multiple Variable Linear

Model)

Graphical Approach

Statistical Approach

Effect/s of the violation in

the model

Remedial Measures

Sample of violation with

remedial measures as

discussed in a research

output published in a

professional research

journal

(At least one for each

violation)

Autocorrelated Disturbances

Time sequence plot

Residuals plots

The Runs Test

The Durbin Watson

Test

The Breusch – Godfrey

the OLS estimator is unbiased

the OLS estimator is inefficient; that is, it is not BLUE

The estimated variances and covariances of the OLS estimates are biased and inconsistent

If there is positive autocorrelation, and if the value of a right-hand side variable grows over time, then the estimate of the standard error of the coefficient estimate of this variable will be

Cochrane-Orcutt

estimator

Hildreth-Lu estimator

AUTOREG Procedure

An Examination of

Socioeconomic

Determinants of

Average Body Mass

Indices in Rwanda

Authors &

Institutional

Affiliations:

-Edward Mutandwa

-College of Forest

Resources, Mississippi

State University, USA

Publication/Journal

Title:

-Open Obesity Journal

Page 12: Assumptions of Simple and Multiple Linear Regression Model

too low and hence the t-statistic too high

Hypothesis tests are not valid.

Date of Publication &

Volume No:

-2015, 7, 1-9

Violation:

-Positive

autocorrelation

How was it detected:

-By using Durbin-

Watson test value,

obtained 0.93,

indicating positive

autocorrelation

(p<0.05)

Remedial Measures:

-By using “proc

autoreg”

Page 13: Assumptions of Simple and Multiple Linear Regression Model

ASSUMPTION 6: ZERO COVARIANCE BETWEEN 𝒖𝒊AND 𝑿𝒊

Departures from/ Violations of assumptions (both Two-Variable Linear Model and

Multiple Variable Linear Model)

Graphical Approach

Statistical Approach

Effect/s of the violation in

the model

Remedial Measures

Sample of violation with

remedial measures as discussed in a research output published in a professional research

journal (At least one for each

violation)

Nonzero covariance between disturbances and regressor

Graphically examining

the cumulative sum of

the recursive residuals

Durbin-Wu-Hausman test for endogeneity

OLS estimator will be

both biased and

inconsistent

Endogeneity

Clever Sample Selection.

Drop the polluted observations of X that covary with the disturbance

Instrumental Variables or Control Variables.

In each observation, drop the polluted component of X or control for the polluted component of X.

Full Information Methods.

Model the covariation of errors across the equations.

Understanding

Estimators of Linear

Regression Model

with AR (1) Error

Which are Correlated

with Exponential

Regressor

Authors &

Institutional

Affiliations:

-J. O. Oalomi

-A. Ifederu

-Department of

Statistics, University of

Ibadan, Ibadan,

Nigeria

Page 14: Assumptions of Simple and Multiple Linear Regression Model

Publication/Journal

Title:

-Asian Journal of

Mathematics and

Statistics 1 (I)

ISSN 1994-5418

© Asian Network for

Scientific Information

Date of Publication &

Volume No:

-2008, 14-23

Violation:

-Autocorrelation

-Nonzero Covariance

between the

explanatory variable

and the error terms

How was it detected:

-Monte Carlo

Approach for the

investigation

Remedial Measures:

-Generalised Least

Square (GLS)

Estimators: CORC,

HILU, ML and MLGRID

Page 15: Assumptions of Simple and Multiple Linear Regression Model

and OLS estimation

methods

- Evaluation of the

estimators using finite

sampling properties of

Bias (BIAS), Sum of

Bias of intercept and

slope coefficient

(SBIAS), Variance

(VAR), sum of

variances of intercept

and slope coefficients

(SVAR) and Root

Mean Squared Error

(RMSE) and Sum of

RMSE of intercept and

slope coefficients

(SRMSE)

Page 16: Assumptions of Simple and Multiple Linear Regression Model

ASSUMPTION 7: THE NUMBER OF OBSERVATIONS N MUST BE GREATER THAN THE NUMBER OF PARAMETERS TO BE ESTIMATED

Departures from/ Violations

of assumptions (both Two-

Variable Linear Model and

Multiple Variable Linear

Model)

Graphical Approach

Statistical Approach

Effect/s of the violation in

the model

Remedial Measures

Sample of violation with

remedial measures as

discussed in a research

output published in a

professional research

journal

(At least one for each

violation)

Sample observations less than the number of regressors

Micronumerosity

When the number of observations barely exceeds the number of parameters to be estimated, then there is Near Micronumerosity

Precision of

estimation is reduced

Estimates of μ may

have large errors

The variance of the

sample mean will be

large.

Will sometimes lead to accept the hypothesis μ = 0 because the ratio of the sample mean to its standard error is small

The estimate of μ will be very sensitive to sample data

Increase the sample size

Some New Proposed

Ridge Parameters For

the Logistic

Regression Model

Authors &

Institutional

Affiliations:

-Ahlam Abdullah

-Alsomahi

-Lutfiah Ismail Al turk

-Department of

Statistics,

Faculty of Sciences,

King Abdulaziz

University, Kingdom of

Saudi Arabia

Page 17: Assumptions of Simple and Multiple Linear Regression Model

The addition of a few

more observations

can sometimes

produce drastic shifts

in the sample mean.

Publication/Journal

Title:

-International Journal

of Development

Research

Date of Publication &

Volume No:

-January, 2015

Vol. 5, Issue, 01, pp. 2

927-2940,

Violation:

-Small sample size n

How was it detected:

-Based on the

resulting estimate.

The variance of the

estimated parameters

is large.

Remedial Measures:

-The sample size is

increased with the

number of

independent variables

Page 18: Assumptions of Simple and Multiple Linear Regression Model

ASSUMPTION 8: VARIABILITY IN X VALUES

Departures from/ Violations

of assumptions (both Two-

Variable Linear Model and

Multiple Variable Linear

Model)

Graphical Approach

Statistical Approach

Effect/s of the violation in

the model

Remedial Measures

Sample of violation with

remedial measures as

discussed in a research

output published in a

professional research

journal

(At least one for each

violation)

Insufficient variability in regressors

Plot the values of X and then draw a line out of the plotted value of X.

It would be easy if there is variability in the values of X.

If it is hard to draw a specific line corresponding to the values of X, then there is a violation.

Formula to test the

variability of X

𝒗𝒂𝒓 𝑿 =∑(𝑿𝒊 − 𝑿 )𝟐

𝒏 − 𝟏

Each values of X

would be equal to the

value of their mean,

and the denominator

of the equation will be

zero, making it

impossible to estimate

Variation in Y would

not be able to explain

Accurate sample size.

Household Sample

Surveys in Developing

and Transition

Countries

Measurement error in

household surveys:

sources and

measurement

Authors &

Institutional

Affiliations:

-Daniel Kasprzyk

-Mathematical Policy

Research Washington,

D.C., United States of

America

Page 19: Assumptions of Simple and Multiple Linear Regression Model

Violation:

- Measurement Error

Remedial Measures:

-Quantifying the

existence and

magnitude of a

specific type of

measurement error

requires advance

planning and

thoughtful

consideration

-Nevertheless, if there

is sufficient concern

that the issue may not

be adequately

resolved during survey

preparations or if the

source of error is

particularly egregious

in the survey being

conducted, survey

managers should

takes steps to design

special studies to

quantify the principal

or problematic source

of error.

Page 20: Assumptions of Simple and Multiple Linear Regression Model

ASSUMPTION 9: THE REGRESSION MODEL IS CORRECTLY SPECIFIED

Departures from/ Violations

of assumptions (both Two-

Variable Linear Model and

Multiple Variable Linear

Model)

Graphical Approach

Statistical Approach

Effect/s of the violation in

the model

Remedial Measures

Sample of violation with

remedial measures as

discussed in a research

output published in a

professional research

journal

(At least one for each

violation)

Model specification

error or Model

specification bias

Omission of a relevant

variable(s)

Inclusion of an

unnecessary

variable(s)

Adopting the wrong

functional form

Errors of

measurement

Incorrect specification

of the stochastic error

term

Plot the residuals

against your variables

Noticeable patterns indicate possible specification errors.

Ramsey’s “Regression

Specification Error

Test” (RESET test).

Run your regression

and obtain the

predicted Y

Rerun your regression

with variants of Yˆ on

the right-hand side

Conduct an F-test to

evaluate the joint

significance of the Yˆ

terms

If the F-test indicates

that the Yˆ terms

improve the fit of the

model, then the

model is likely

misspecified

If the omitted variable

is correlated with the

included variable,

then the estimates of

the constant and

slope coefficients are

biased and

inconsistent. In other

words, the bias does

not disappear as the

sample size gets larger

If the omitted

variable is not

correlated with the

included variables,

then the slope

coefficient is

unbiased. However,

the coefficient on the

constant term remains

Reformulate model

Remove the irrelevant variable

Transformation of variables

The Asymmetric

Effect of Income on

Import Demand in

Greece

Authors &

Institutional

Affiliations:

-Ionna C. Bardakas

-Bank of Greece

Date of Publication &

Volume No:

-May 2013, No.159

Violation:

-Model Specification

error

Page 21: Assumptions of Simple and Multiple Linear Regression Model

biased

The disturbance

variance σ 2 is

incorrectly estimated

The estimated

variance of the slope

coefficient is a biased

estimator of the

variance of the true

estimator from the

fully-specified model

Confidence intervals

and hypothesis testing

procedures are likely

to give misleading

results about the

significance of

parameters

How was it detected:

-RESET test

Remedial Measures:

-Short-run error

correction equation is

estimated,

insignificant variables

are discarded and a

more parsimonious

specification adopted.

Page 22: Assumptions of Simple and Multiple Linear Regression Model

ASSUMPTION 10: THERE IS NO MULTICOLLINEARITY

Departures from/ Violations

of assumptions (both Two-

Variable Linear Model and

Multiple Variable Linear

Model)

Graphical Approach

Statistical Approach

Effect/s of the violation in

the model

Remedial Measures

Sample of violation with

remedial measures as

discussed in a research

output published in a

professional research

journal

(At least one for each

violation)

Multicollinearity

Scatterplot

Regress and look for a

‘high’ R2 but few

significant ratios.

All the coefficients cannot be estimated precisely

Standard errors will be

infinite.

Check for errors or

problematic

computations of

predictor variables.

Eliminate one of the

redundant variables.

Average the

redundant variables.

Transformation of

variables.

New data ( Extend

time series, change

nature or source of

data)

Do nothing

Population Ageing and Health Care Expenditure: New Evidence on the “Red Herring”

Authors & Institutional Affiliations: -Peter Zweifel -Stefan Felder -Andreas Werblow

Publication/Journal Title: -The Geneva Papers on Risk and Insurance

Date of Publication &

Volume No: October 2004, Vol.29 No.4

Page 23: Assumptions of Simple and Multiple Linear Regression Model

Violation: -Multicollinearity

How was it detected:

-An OLS regression of the inverse Mill’s ratio ƛ on the explanatory variables results in an R2 of 0.9897, suggesting almost perfect linearity

Remedial Measures:

-Multicollinearity is at least mitigated by employing a two-part model in addition to the Heckman model. The two-part model separates the selection part from the equation that explains the level of HCE. Thus, the correlation between the selection term ƛ and the age variables as a source of multicollinearity is eliminated.

Page 24: Assumptions of Simple and Multiple Linear Regression Model

ASSUMPTION 11: NORMALITY OF DISTURBANCES

Departures from/ Violations of assumptions (both Two-Variable Linear Model and

Multiple Variable Linear Model)

Graphical Approach

Statistical Approach

Effect/s of the violation in

the model

Remedial Measures

Sample of violation with

remedial measures as discussed in a research output published in a professional research

journal (At least one for each

violation)

Non-normality of disturbances

Skew -non-symmetricality -one tail longer than the other

Kurtosis -too flat or too peaked -kurtosed

Outliers -individual cases which are far from the distribution

Histograms

Boxplots

P-P Plots

Anderson-Darling test

of normality

Jarque-Bera test of

normality

Correlation Test

Obtain correlation between observed residuals and expected values under normality

Compare correlation with critical value

Reject the null

hypothesis of normal

errors if the

correlation falls below

the table value

Shapiro-Wilk Test

In finite samples,

without the normality

assumption the usual

t and F statistics may

not follow the t and F

distributions.

Skew biases the mean, in

direction of skew

Kurtosis standard deviation is

biased -and hence standard errors, and significance tests

Box-Cox

Transformations

Sales Location and

Supply Response

among

Semisubsistence

Farmers in Benin

Authors &

Institutional

Affiliations:

-Hiroyuki Takeshima

-Alex Winter-Nelson

-Development

Strategy and

Governance Division

Publication/Journal

Title:

-International Food

Policy Research

Institute

Page 25: Assumptions of Simple and Multiple Linear Regression Model

Discussion Paper

Date of Publication &

Volume No:

-July 2010, 00999

Violation:

-Nonnormality

How was it detected:

-By using Lagrange

Multiplier (LM) test

which is derived from

yhe Jarque Bera test

Remedial Measure/s:

-Certain insignificant

variables are dropped

from each

specification when the

omission of such

variables leads to

stronger evidence of

consistency of the

model, which is

diagnosed by

normality test.