![Page 1: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/1.jpg)
Class 5Multiple Regression
CERAM February-March-April 2008
Lionel NestaObservatoire Français des Conjonctures Economiques
![Page 2: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/2.jpg)
Introduction to Regression Typically, the social scientist is dealing with multiple
and complex webs of interactions between variables. An immediate and appealing extension to simple linear regression is to extend the set of explanatory variable to other variables.
Multiple regressions include several explanatory variables in the empirical model
1 21 2
pi i i p i iy x x x u
![Page 3: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/3.jpg)
Introduction to Regression Typically, the social scientist is dealing with multiple
and complex webs of interactions between variables. An immediate and appealing extension to simple linear regression is to extend the set of explanatory variable to other variables.
Multiple regressions include several explanatory variables in the empirical model
1
k Kk
i k i ik
y x u
![Page 4: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/4.jpg)
22
1 1
21
1
2
1
220 , ,
ˆˆmin min
0
, ,
ˆ
,
n n k K
n
j k
ki i i iki i k
i
Kik
n
y y y x
To minimize the sum of squared errors
![Page 5: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/5.jpg)
1
12
ˆ
ˆcov( )
i i iy x u
β XX
y = Xβ +
y
u
X
β XX
Multivariate Least Square Estimator
Usually, the multivariate is described by matrix notation:
With the following least square solution:
![Page 6: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/6.jpg)
Assumption OLS 1
20 1 1y x u
It is possible to operate non linear transformation of the variables (e.g. log of x) but not of the parameters like the following :
0 1 1 2 2 k ky x x x u
LinearityThe model is linear in its parameters
OLS can not estimate this
![Page 7: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/7.jpg)
Assumption OLS 2
There is no selection bias in the sample. The results pertain to the whole population
All observations are independent from one another (no serial nor cross-sectional correlation)
Random SamplingThe n observations are a random sample of
the whole population
![Page 8: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/8.jpg)
Assumption OLS 3
No independent variable is constant. Each variable has variance which can be used with the variance of the dependent variable to compute the parameters.
No exact linear relationships amongst independent variables
No perfect Collinearity There is no collinearity between independent
variables
![Page 9: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/9.jpg)
Assumption OLS 4
Given any values of the independent variables (IV), the error term must have an expected value of zero.
In this case, all independent variables are exogenous. Otherwise, at least one IV suffers from an endogeneity problem.
Zero Conditional Mean The error term u has an expected value of zero
1 2 kE u x ,x , ,x 0
![Page 10: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/10.jpg)
Sources of endogeneity
Wrong specification of the model
Omitted variable correlated with one RHS.
Measurement errors of RHS
Mutual causation between LHS and RHS
Simultaneity
![Page 11: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/11.jpg)
Assumption OLS 5
21 2 k uVar u x ,x , ,x
Homoskedasticity The variance of the error term, u, conditional on RHS, is the same for all values of RHS.
Otherwise we speak of heteroskedasticity.
![Page 12: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/12.jpg)
Assumption OLS 6
Normality of error termThe error term is independent of all RHS and follows a normal distribution with zero mean
and variance
2u Normal(0, )
2
![Page 13: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/13.jpg)
Assumptions OLS
OLS1 Linearity
OLS2 Random Sampling
OLS3 No perfect Collinearity
OLS4 Zero Conditional Mean
OLS5 Homoskedasticity
OLS6 Normality of error term
![Page 14: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/14.jpg)
Theorem 1
j jˆE , j 0,1,2, ,k
OLS1 - OLS4 : Unbiasedness of OLS. The set of estimated parameters is equal to the true unknown values of j
j
![Page 15: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/15.jpg)
Theorem 2OLS1 – OLS5 : Variance of OLS estimate. The variance of the OLS estimator is
2u
j n 2 2ij j j
i 1
ˆVarx x 1 R
… where R²j is the R-squared from regressing xj on all other independent variables. But how can we measure ?
2u
![Page 16: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/16.jpg)
Theorem 3OLS1 – OLS5 : The standard error of the regression is defined as
22
i ii2 2 i iu u
ˆy y uˆE
n k 1n k 1
This is also called the standard error of the estimate or the root mean squared errors (RMSE)
![Page 17: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/17.jpg)
Standard Error of Each Parameter Combining theorems 2 and 3 yields:
uj n 2 2
ij j ji 1
ˆˆsex x 1 R
![Page 18: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/18.jpg)
Theorem 4Under assumptions OLS1 – OLS5, estimators are the best linear unbiased estimators (BLUE) of
0 1 kˆ ˆ ˆ, , ,
0 1 k, , ,
Assumptions OLS1 – OLS5 are known as the Gauss-Markov Theorem, which stipulates that under OLS1-5, the OLS are the best estimation methodThe estimates are unbiased (OLS1-4)The estimates have the smallest variance (OLS5)
![Page 19: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/19.jpg)
Theorem 5Under assumptions OLS1 – OLS6, the OLS estimates follows a t distribution:
j jn k 1
j
ˆtˆse( )
![Page 20: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/20.jpg)
Extension of theorem 5: Inference We can define de confidence interval of β, at 95% :
.025
2 2
1
ˆt
1
ujj n
ij j ji
x x R
If the 95% CI does not include 0, then β is significantly different than 0.
![Page 21: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/21.jpg)
Student t Test for H0: βj=0 We are also in the position to infer on βj
H0: βj = 0
H1: βj ≠ 0
Rule of decision
Accept H0 is | t | < tα/2
Reject H0 is | t | ≥ tα/2
ˆ ˆ
tse se
![Page 22: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/22.jpg)
Summary
OLS1 Linearity
OLS2 Random Sampling
OLS3 No perfect Collinearity
OLS4 Zero Conditional Mean
OLS5 Homoskedasticity
OLS6 Normality of error term
T1UnbiasednessT2-T4
BLUET5β ~ t
![Page 23: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/23.jpg)
The knowledge production function
Application 1: Seminal model
1 2
1 2
PAT f (RD,SIZE)
PAT A RD SIZE exp u
pat rd size u
![Page 24: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/24.jpg)
The knowledge production function
Application 2: Changing specification
1
2
1 2
PAT f (RD,SIZE)
RDPAP A SIZE exp uSIZE
RDy log size uSIZE
![Page 25: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/25.jpg)
The knowledge production function
Application 3: Adding variables
1
23
1 2 3
PAT f (RD,SIZE,SPE)
RDPAT A SIZE exp SPE uSIZErdpat size SPE usize
![Page 26: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/26.jpg)
The knowledge production function
Application 4: Dummy variables
1
23 4
1 2 3 4
PAT f (RD,SIZE,SPE,BIO)
RDPAT A SIZE exp SPE BIO uSIZErdpat size SPE BIO usize
![Page 27: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/27.jpg)
Application 4: Dummy variables
Coefficientsa
-5.465 .616 -8.864 .000 -6.676 -4.253.556 .042 .909 13.326 .000 .474 .638.492 .080 .313 6.123 .000 .334 .650.421 .145 .118 2.912 .004 .137 .706
1.657 .168 .665 9.835 .000 1.326 1.988
(constante)lnassetslnrd_assetsspebio
Modèle1
BErreur
standard
Coefficients nonstandardisés
Bêta
Coefficientsstandardisés
t SignificationBorne
inférieureBorne
supérieure
Intervalle de confiance à95% de B
Variable dépendante : lnpatenta.
![Page 28: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/28.jpg)
Application 4: Dummy variables
Patent(lnpatent)
Size(lnasset)
4
42DBF: size
2LDF: size
2Slope
2Slope
![Page 29: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/29.jpg)
The knowledge production function
Application 5: Interacting Variables
1
2
4 5
4
3
1 2 3
5
PAT f (RD,SIZE,SPE,BIO)
RDPAT A SIZESIZE
exp SPE BIO BIO size u
rdpat size SPEsize
BIO BIO size u
![Page 30: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/30.jpg)
Application 5: Interacting Variables
Coefficientsa
-6.483 .843 -7.693 .000 -8.139 -4.827.620 .055 1.013 11.241 .000 .511 .728.474 .081 .301 5.863 .000 .315 .633.413 .144 .115 2.862 .004 .129 .697
3.592 1.108 1.441 3.242 .001 1.415 5.770-.144 .081 -.693 -1.767 .078 -.303 .016
(constante)lnassetslnrd_assetsspebiosizebio
Modèle1
BErreur
standard
Coefficients nonstandardisés
Bêta
Coefficientsstandardisés
t SignificationBorne
inférieureBorne
supérieure
Intervalle de confiance à95% de B
Variable dépendante : lnpatenta.
![Page 31: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques](https://reader035.vdocuments.mx/reader035/viewer/2022081521/5a4d1b467f8b9ab0599a35df/html5/thumbnails/31.jpg)
Application 5: Interacting variables
Patent(lnpatent)
Size(lnasset)
4
2 4 5DBF: size size bio
2LDF: size
2 5Slope size bio
2Slope