assumptions of simple and multiple linear regression model
DESCRIPTION
Econometrics Project Group 8 BS Economics 3-1 Academic Year 2014-2015TRANSCRIPT
Polytechnic University of the Philippines
Sta. Mesa, Manila
ECON 3023 ECONOMETRICS
GROUP PROJECT
Submitted By:
GROUP 8
Camacho, Irwin Dave
De Ramos, Liezel
Gonzales, Divina
Oliver, Ralph Laurence
B.S. ECONOMICS 3-1
Submitted To:
Prof. Alberto Guillo
Vice President for Administration, PUP
ASSUMPTIONS OF SIMPLE AND
MULTIPLE REGRESSION MODEL
ASSUMPTION 1: LINEAR REGRESSION MODEL
Departures from/ Violations of assumptions (both Two-Variable Linear Model and
Multiple Variable Linear Model)
Graphical Approach
Statistical Approach
Effect/s of the violation in
the model
Remedial Measures
Sample of violation with
remedial measures as discussed in a research output published in a professional research
journal (At least one for each
violation)
Nonlinearity in Parameters
Scatterplot
Plot of observed versus predicted values
Plot of residuals versus predicted values -Better than the observed-versus-predicted plot for this purpose, because it eliminates the visual distraction of a sloping pattern
Chow Test
The normal equations for nonlinear regression have the unknowns (the B’s) both on the left and right-hand sides of the equations.
As a consequence, we cannot obtain explicit solutions of the unknowns in terms of the known quantities.
Wrong regressors
Changing parameters
Perform a curvilinear transformation
If there appears to be
a quadratic pattern to
the residuals,
polynomial
transformation of
degree 2 should be
tried.
Log transformation
Locally Weighted
Scatterplot Smoothing
(LOWESS)
Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff
Authors & Institutional Affiliations: -Thomas Herndon -Michael Ash -Robert Pollin -Political Economy Research Institute, -University of Massachusetts Amherst
Publication/Journal Title: -Working Paper Series
Date of Publication,
Volume No. and Year: -April 2013, Number 322
Violation:
-Nonlinear relationship between the variables
How was it detected: -Scatterplot
Remedial Measure: -LOWESS
ASSUMPTION 2: X VALUES ARE FIXED IN REPEATED SAMPLING
Departures from/ Violations of assumptions (both Two-Variable Linear Model and
Multiple Variable Linear Model)
Graphical Approach
Statistical Approach
Effect/s of the violation in
the model
Remedial Measures
Sample of violation with
remedial measures as discussed in a research output published in a professional research
journal (At least one for each
violation)
Stochastic regressor(s)
Endogeneity
Graphically examining
the cumulative sum of
the recursive residuals
Durbin-Wu-Hausman Specification Test
Johansen Test
Engle-Granger two-step method
Simultaneous
equation or
simultaneity problem
The estimates of the
slope and intercept
will be biased
Use of Proxy variable
Instrumental Variable (IV) Regression
Clever Sample Selection.
Drop the polluted observations of X that covary with the disturbance
Instrumental Variables or Control Variables.
In each observation, drop the polluted component of X or control for the polluted component of X.
Full Information Methods.
Model the covariation of errors across the equations.
Foreign Direct Investment Inflows and Economic Growth in Ghana
Author: -Baba Insah
Publication/Journal Title: -International Journal of Economic Practices and Theories
Date of Publication, Volume No. and Year: -April 2013, Vol.3, No.2
Violation: -Stochastic regressors
How was it detected:
-Johansen Test
Remedial Measure: -Engle-Granger two-step methodology for error correction was employed -Error Correction was utilized.
ASSUMPTION 3: ZERO MEAN VALUE OF DISTURBANCE ui
Departures from/ Violations of assumptions (both Two-Variable Linear Model and
Multiple Variable Linear Model)
Graphical Approach
Statistical Approach
Effect/s of the violation in
the model
Remedial Measures
Sample of violation with
remedial measures as discussed in a research output published in a professional research
journal (At least one for each
violation)
Nonzero mean of ui
Plot of residuals
against the
predictor(s)
if there are more than
a couple of
predictors, at least
against fitted values
Cobb-Douglas Function
The coefficient
estimate gets biased
down
Randomization of the measurement order
Randomization can
effectively convert
systematic
measurement errors
into additional
random process error.
While adding to the
random error of the
process is undesirable,
this will provide the
best possible
information from the
data about the
regression function.
Using additional
information
Redesigning the
measurement system
Consumption of
Tobacco and
Alcoholic Beverages
Among Spanish
Consumers
Authors &
Institutional
Affiliations:
-Anna-Lena Beutel
- Stefan Minner
-Department of
Business
Administration,
University of Vienna,
Austria
to eliminate the
systematic errors
Reformulating the
problem to obtain the
needed information
the other way
Publication/Journal
Title:
-International Journal
of Production
Economics
Date of Publication &
Volume No:
2011
Violation:
-non zero mean of
error
How was it detected:
-Using plot of
disturbance
Remedial Measures
-Used service level
model which performs
better for the target
fill rate than for in-
stock probabilities
ASSUMPTION 4: HOMOSCEDASTICITY OR EQUAL VARIANCE OF 𝒖𝒊
Departures from/ Violations
of assumptions (both Two-
Variable Linear Model and
Multiple Variable Linear
Model)
Graphical Approach
Statistical Approach
Effect/s of the violation in
the model
Remedial Measures
Sample of violation with
remedial measures as
discussed in a research
output published in a
professional research
journal
(At least one for each
violation)
Heteroscedasticity
Scatterplot
look for any pattern:
No heteroscadasticity
Not Linear and nature is unknown
Linear increase and
presence of heteroscadasticity
Heteroscadasticity
with quadratic relationship
Quadratic relationship
Park Test
Breush-Pagan / Cook-Weisberg Test for Heteroscedasticity
White General Test
for Heteroscedasticity
Goldfeld-Quandt Test
Gives equal weight to
all observations
Standard errors are
biased
This in turn leads to
bias in test statistics
and confidence
intervals
Respectify the Model/
Transform the
Variable
Use Robust Standard
Error (also referred as
Huber/White
estimators or
sandwich estimators
of variance)
Use Weighted Least
Square
Meta-analysis of
alcohol price and
income elasticities –
with corrections for
publication bias
Authors &
Institutional
Affiliations:
-Jon P. Nelson
Publication/Journal
Title:
-Nelson Health
Economics Review
Date of Publication &
Volume No:
2013, 3:17
Violation:
-Heteroscedasticity
How was it detected:
-Scatterplot; a
“funnel-shaped” plot
was detected
Remedial Measures:
-Dividing equation (1)
by the standard error
to yield
where ti is the t-
statistic for the i-th
observation, 1/Sei is
its precision, and vi is
an error term
corrected for
heteroscedasticity.
ASSUMPTION 5: NO AUTOCORRELATION BETWEEN THE DISTURBANCES
Departures from/ Violations
of assumptions (both Two-
Variable Linear Model and
Multiple Variable Linear
Model)
Graphical Approach
Statistical Approach
Effect/s of the violation in
the model
Remedial Measures
Sample of violation with
remedial measures as
discussed in a research
output published in a
professional research
journal
(At least one for each
violation)
Autocorrelated Disturbances
Time sequence plot
Residuals plots
The Runs Test
The Durbin Watson
Test
The Breusch – Godfrey
the OLS estimator is unbiased
the OLS estimator is inefficient; that is, it is not BLUE
The estimated variances and covariances of the OLS estimates are biased and inconsistent
If there is positive autocorrelation, and if the value of a right-hand side variable grows over time, then the estimate of the standard error of the coefficient estimate of this variable will be
Cochrane-Orcutt
estimator
Hildreth-Lu estimator
AUTOREG Procedure
An Examination of
Socioeconomic
Determinants of
Average Body Mass
Indices in Rwanda
Authors &
Institutional
Affiliations:
-Edward Mutandwa
-College of Forest
Resources, Mississippi
State University, USA
Publication/Journal
Title:
-Open Obesity Journal
too low and hence the t-statistic too high
Hypothesis tests are not valid.
Date of Publication &
Volume No:
-2015, 7, 1-9
Violation:
-Positive
autocorrelation
How was it detected:
-By using Durbin-
Watson test value,
obtained 0.93,
indicating positive
autocorrelation
(p<0.05)
Remedial Measures:
-By using “proc
autoreg”
ASSUMPTION 6: ZERO COVARIANCE BETWEEN 𝒖𝒊AND 𝑿𝒊
Departures from/ Violations of assumptions (both Two-Variable Linear Model and
Multiple Variable Linear Model)
Graphical Approach
Statistical Approach
Effect/s of the violation in
the model
Remedial Measures
Sample of violation with
remedial measures as discussed in a research output published in a professional research
journal (At least one for each
violation)
Nonzero covariance between disturbances and regressor
Graphically examining
the cumulative sum of
the recursive residuals
Durbin-Wu-Hausman test for endogeneity
OLS estimator will be
both biased and
inconsistent
Endogeneity
Clever Sample Selection.
Drop the polluted observations of X that covary with the disturbance
Instrumental Variables or Control Variables.
In each observation, drop the polluted component of X or control for the polluted component of X.
Full Information Methods.
Model the covariation of errors across the equations.
Understanding
Estimators of Linear
Regression Model
with AR (1) Error
Which are Correlated
with Exponential
Regressor
Authors &
Institutional
Affiliations:
-J. O. Oalomi
-A. Ifederu
-Department of
Statistics, University of
Ibadan, Ibadan,
Nigeria
Publication/Journal
Title:
-Asian Journal of
Mathematics and
Statistics 1 (I)
ISSN 1994-5418
© Asian Network for
Scientific Information
Date of Publication &
Volume No:
-2008, 14-23
Violation:
-Autocorrelation
-Nonzero Covariance
between the
explanatory variable
and the error terms
How was it detected:
-Monte Carlo
Approach for the
investigation
Remedial Measures:
-Generalised Least
Square (GLS)
Estimators: CORC,
HILU, ML and MLGRID
and OLS estimation
methods
- Evaluation of the
estimators using finite
sampling properties of
Bias (BIAS), Sum of
Bias of intercept and
slope coefficient
(SBIAS), Variance
(VAR), sum of
variances of intercept
and slope coefficients
(SVAR) and Root
Mean Squared Error
(RMSE) and Sum of
RMSE of intercept and
slope coefficients
(SRMSE)
ASSUMPTION 7: THE NUMBER OF OBSERVATIONS N MUST BE GREATER THAN THE NUMBER OF PARAMETERS TO BE ESTIMATED
Departures from/ Violations
of assumptions (both Two-
Variable Linear Model and
Multiple Variable Linear
Model)
Graphical Approach
Statistical Approach
Effect/s of the violation in
the model
Remedial Measures
Sample of violation with
remedial measures as
discussed in a research
output published in a
professional research
journal
(At least one for each
violation)
Sample observations less than the number of regressors
Micronumerosity
When the number of observations barely exceeds the number of parameters to be estimated, then there is Near Micronumerosity
Precision of
estimation is reduced
Estimates of μ may
have large errors
The variance of the
sample mean will be
large.
Will sometimes lead to accept the hypothesis μ = 0 because the ratio of the sample mean to its standard error is small
The estimate of μ will be very sensitive to sample data
Increase the sample size
Some New Proposed
Ridge Parameters For
the Logistic
Regression Model
Authors &
Institutional
Affiliations:
-Ahlam Abdullah
-Alsomahi
-Lutfiah Ismail Al turk
-Department of
Statistics,
Faculty of Sciences,
King Abdulaziz
University, Kingdom of
Saudi Arabia
The addition of a few
more observations
can sometimes
produce drastic shifts
in the sample mean.
Publication/Journal
Title:
-International Journal
of Development
Research
Date of Publication &
Volume No:
-January, 2015
Vol. 5, Issue, 01, pp. 2
927-2940,
Violation:
-Small sample size n
How was it detected:
-Based on the
resulting estimate.
The variance of the
estimated parameters
is large.
Remedial Measures:
-The sample size is
increased with the
number of
independent variables
ASSUMPTION 8: VARIABILITY IN X VALUES
Departures from/ Violations
of assumptions (both Two-
Variable Linear Model and
Multiple Variable Linear
Model)
Graphical Approach
Statistical Approach
Effect/s of the violation in
the model
Remedial Measures
Sample of violation with
remedial measures as
discussed in a research
output published in a
professional research
journal
(At least one for each
violation)
Insufficient variability in regressors
Plot the values of X and then draw a line out of the plotted value of X.
It would be easy if there is variability in the values of X.
If it is hard to draw a specific line corresponding to the values of X, then there is a violation.
Formula to test the
variability of X
𝒗𝒂𝒓 𝑿 =∑(𝑿𝒊 − 𝑿 )𝟐
𝒏 − 𝟏
Each values of X
would be equal to the
value of their mean,
and the denominator
of the equation will be
zero, making it
impossible to estimate
Variation in Y would
not be able to explain
Accurate sample size.
Household Sample
Surveys in Developing
and Transition
Countries
Measurement error in
household surveys:
sources and
measurement
Authors &
Institutional
Affiliations:
-Daniel Kasprzyk
-Mathematical Policy
Research Washington,
D.C., United States of
America
Violation:
- Measurement Error
Remedial Measures:
-Quantifying the
existence and
magnitude of a
specific type of
measurement error
requires advance
planning and
thoughtful
consideration
-Nevertheless, if there
is sufficient concern
that the issue may not
be adequately
resolved during survey
preparations or if the
source of error is
particularly egregious
in the survey being
conducted, survey
managers should
takes steps to design
special studies to
quantify the principal
or problematic source
of error.
ASSUMPTION 9: THE REGRESSION MODEL IS CORRECTLY SPECIFIED
Departures from/ Violations
of assumptions (both Two-
Variable Linear Model and
Multiple Variable Linear
Model)
Graphical Approach
Statistical Approach
Effect/s of the violation in
the model
Remedial Measures
Sample of violation with
remedial measures as
discussed in a research
output published in a
professional research
journal
(At least one for each
violation)
Model specification
error or Model
specification bias
Omission of a relevant
variable(s)
Inclusion of an
unnecessary
variable(s)
Adopting the wrong
functional form
Errors of
measurement
Incorrect specification
of the stochastic error
term
Plot the residuals
against your variables
Noticeable patterns indicate possible specification errors.
Ramsey’s “Regression
Specification Error
Test” (RESET test).
Run your regression
and obtain the
predicted Y
Rerun your regression
with variants of Yˆ on
the right-hand side
Conduct an F-test to
evaluate the joint
significance of the Yˆ
terms
If the F-test indicates
that the Yˆ terms
improve the fit of the
model, then the
model is likely
misspecified
If the omitted variable
is correlated with the
included variable,
then the estimates of
the constant and
slope coefficients are
biased and
inconsistent. In other
words, the bias does
not disappear as the
sample size gets larger
If the omitted
variable is not
correlated with the
included variables,
then the slope
coefficient is
unbiased. However,
the coefficient on the
constant term remains
Reformulate model
Remove the irrelevant variable
Transformation of variables
The Asymmetric
Effect of Income on
Import Demand in
Greece
Authors &
Institutional
Affiliations:
-Ionna C. Bardakas
-Bank of Greece
Date of Publication &
Volume No:
-May 2013, No.159
Violation:
-Model Specification
error
biased
The disturbance
variance σ 2 is
incorrectly estimated
The estimated
variance of the slope
coefficient is a biased
estimator of the
variance of the true
estimator from the
fully-specified model
Confidence intervals
and hypothesis testing
procedures are likely
to give misleading
results about the
significance of
parameters
How was it detected:
-RESET test
Remedial Measures:
-Short-run error
correction equation is
estimated,
insignificant variables
are discarded and a
more parsimonious
specification adopted.
ASSUMPTION 10: THERE IS NO MULTICOLLINEARITY
Departures from/ Violations
of assumptions (both Two-
Variable Linear Model and
Multiple Variable Linear
Model)
Graphical Approach
Statistical Approach
Effect/s of the violation in
the model
Remedial Measures
Sample of violation with
remedial measures as
discussed in a research
output published in a
professional research
journal
(At least one for each
violation)
Multicollinearity
Scatterplot
Regress and look for a
‘high’ R2 but few
significant ratios.
All the coefficients cannot be estimated precisely
Standard errors will be
infinite.
Check for errors or
problematic
computations of
predictor variables.
Eliminate one of the
redundant variables.
Average the
redundant variables.
Transformation of
variables.
New data ( Extend
time series, change
nature or source of
data)
Do nothing
Population Ageing and Health Care Expenditure: New Evidence on the “Red Herring”
Authors & Institutional Affiliations: -Peter Zweifel -Stefan Felder -Andreas Werblow
Publication/Journal Title: -The Geneva Papers on Risk and Insurance
Date of Publication &
Volume No: October 2004, Vol.29 No.4
Violation: -Multicollinearity
How was it detected:
-An OLS regression of the inverse Mill’s ratio ƛ on the explanatory variables results in an R2 of 0.9897, suggesting almost perfect linearity
Remedial Measures:
-Multicollinearity is at least mitigated by employing a two-part model in addition to the Heckman model. The two-part model separates the selection part from the equation that explains the level of HCE. Thus, the correlation between the selection term ƛ and the age variables as a source of multicollinearity is eliminated.
ASSUMPTION 11: NORMALITY OF DISTURBANCES
Departures from/ Violations of assumptions (both Two-Variable Linear Model and
Multiple Variable Linear Model)
Graphical Approach
Statistical Approach
Effect/s of the violation in
the model
Remedial Measures
Sample of violation with
remedial measures as discussed in a research output published in a professional research
journal (At least one for each
violation)
Non-normality of disturbances
Skew -non-symmetricality -one tail longer than the other
Kurtosis -too flat or too peaked -kurtosed
Outliers -individual cases which are far from the distribution
Histograms
Boxplots
P-P Plots
Anderson-Darling test
of normality
Jarque-Bera test of
normality
Correlation Test
Obtain correlation between observed residuals and expected values under normality
Compare correlation with critical value
Reject the null
hypothesis of normal
errors if the
correlation falls below
the table value
Shapiro-Wilk Test
In finite samples,
without the normality
assumption the usual
t and F statistics may
not follow the t and F
distributions.
Skew biases the mean, in
direction of skew
Kurtosis standard deviation is
biased -and hence standard errors, and significance tests
Box-Cox
Transformations
Sales Location and
Supply Response
among
Semisubsistence
Farmers in Benin
Authors &
Institutional
Affiliations:
-Hiroyuki Takeshima
-Alex Winter-Nelson
-Development
Strategy and
Governance Division
Publication/Journal
Title:
-International Food
Policy Research
Institute
Discussion Paper
Date of Publication &
Volume No:
-July 2010, 00999
Violation:
-Nonnormality
How was it detected:
-By using Lagrange
Multiplier (LM) test
which is derived from
yhe Jarque Bera test
Remedial Measure/s:
-Certain insignificant
variables are dropped
from each
specification when the
omission of such
variables leads to
stronger evidence of
consistency of the
model, which is
diagnosed by
normality test.