econometrics ii - uni-muenster.de · module statistics and module empirical methods descriptive...
Post on 23-Aug-2019
220 Views
Preview:
TRANSCRIPT
Econometrics II
Andrea Beccarini
Winter2011/2012
1
Introduction
� Econometrics: application of statistical methods to empirical research ineconomics
� Compare theory with facts (data)
� Statistics: foundation of econometrics
2
Module Statistics and Module Empirical Methods
� Descriptive statistics (Statistik I)How to process data? How to display data?
� Probability theory and statistical inference (Statistik II)Estimation of unknown parameters from random samples; hypothesis tests
� Empirical research in economics (Empirische Wirtschaftsforschung)Applications of the linear model; statistical software
3
Module Statistics/Econometrics/Empirical Economics I
� Advanced StatisticsProbability theory; multidimensional random variables;estimation and hypothesis testing
� Econometrics ISimple and multiple linear regression model
� Econometrics IIExtensions of the multivariate linear regression model;simultaneous equation systems; dynamic models
4
Module Statistics/Econometrics/Empirical Economics II
� Time series analysis:Stochastic processes; stationarity; ergodicity; linear processes;unit root processes; cointegration; vector-autoregressive models
� One further special course or seminar, e.g.
�Financial econometrics�Panel data econometrics� Introduction to R�Poverty and inequality�Statistical inference, bootstrap�Wage and earnings dynamics
5
Literature: Statistical basics
� Karl Mosler and Friedrich Schmid, Wahrscheinlichkeitsrechnungand schließende Statistik, 2. Au�., Springer, 2006.
� Aris Spanos, Statistical Foundations of Econometric Modelling,Cambridge University Press, 1986.
� Mood, A.M., Graybill, F.A. and D.C. Boes (1974). Introduction tothe theory of statistics, 3rd ed., McGraw-Hill, Tokyo.
6
Literature: Econometrics
� Main book for this course: Ludwig von Auer, Ökonometrie: Eine Einführung, 4.Au�., Springer, 2005.
� Alternatively: William E. Gri¢ ths, R. Carter Hill and George G. Judge, Learningand Practicing Econometrics, John Wiley & Sons, 1993.
� James Stock and Mark Watson, Introduction to Econometrics,Addison Wesley, 2003.
� Russell Davidson and James MacKinnon, Econometric Theoryand Methods, Oxford University Press, 2004.
7
Class
� Class teacher: Rainer Schüssler
� Time and location: Tue, 14.00-16.00, CAWM1
� A detailed schedule is available on the home page of this coursehttp://www.wiwi.uni-muenster.de/statistik�! Studium �! Aktuelle Veranstaltungen �! Econometrics II
8
Outline
� Very brief revision of Econometrics I (chap. 8 to 14)
� Violations of model assumptions (chap. 15 to 19, 21)
� Stochastic exogenous variables (chap. 20)
� Dynamic models (chap. 22)
� Interdependent equation systems (chap. 23)9
Multiple linear regression model (revision)
Assumption A1: No relevant exogenous variable is omitted from the econometricmodel, and all exogenous variables in the model are relevant
Assumption A2: The true functional dependence between X and y is linear
Assumption A3: The parameters � are constant for all T observations (xt; yt)
Assumptions B1 to B4:
u � N�0; �2IT
�10
Assumption C1: The exogenous variables x1t; : : : ; xKt are not stochastic, but canbe controlled as in an experimental situation
Assumption C2: No perfect multicollinearity:
rank(X) = K + 1
� Econometric model:
y = X� + u
� Point estimator (OLS):
� =�X0X
��1X0y
11
� Estimated model:
y = X�
� Residuals:
u = y � y
� Coe¢ cient of determination:
R2 =Syy � SuuSyy
=Syy
Syy=
PKk=1
b�kSkySyy
12
� Unbiasedness:
E(�) = �
� Covariance matrix of �
V(�) = �2�X0X
��1
� Gauss-Markov theorem: � is BLUE
13
� Distribution of y:y � N(X�; �2IT )
� Distribution of �:
� � N��; �2
�X0X
��1�
� Estimator of error term variance:
�2 =Suu
T �K � 1
� Unbiasedness:E(�2) = �2
14
� Interval estimator of the component �k of �h�k � ta=2 � cse(�k) ; �k + ta=2 � cse(�k)i
� t-test:
H0 : r0� = qH1 : r0� 6= q
where
r = [r0; r1; : : : ; rK]0
� Test statistic:
t =r0� � qcse(r0�)
15
� F -test:
H0 : R� = q
H0 : R� 6= q
� Test statistic:
F =
�S0bubu � Sbubu�.L
Sbubu/ (T �K � 1);
or
F =
�R� � q
�0 hR�X0X
��1R0i�1 �R� � q� =Lu0u= (T �K � 1)
where L is the number of restrictions in H0
16
� Forecasting: Let x0 = [1; x10; x20; : : : ; xK0]0 be the vector of exogenousvariables
� Point forecast: y0 = x00�
� Variance of the forecast error:
V ar (y0 � y0) = �2�1 + x00
�X0X
��1x0
�
� Violation of A1: Omitted or redundant variables
� Violation of A2: Nonlinear functional forms17
Qualitative exogenous variables
� A3: The parameters � are constant for all T observations (xt; yt)
� Example: Wage yt depends on both education x1t and age x2tyt = �+ �1x1t + �2x2t + ut
� Suppose the parameters di¤er between men and women
yt = �M + �M1x1t + �M2x2t + utyt = �F + �F1x1t + �F2x2t + ut
� What happens if the gender di¤erence is ignored? [dummy.R]18
� Introduce a dummy variable
Dt =
(0 if male1 if female
� Extended model
yt = �+Dt + �1x1t + �1Dtx1t + �2x2t + �2Dtx2t + ut
� Submodels for men (Dt = 0) and women (Dt = 1)
yt = � + �1 x1t + �2 x2t + utyt = (�+ ) + (�1 + �1) x1t + (�2 + �2) x2t + ut
� Interpretation of the coe¢ cients ; �1; �219
� Estimation of the model by OLS?
� How does the matrix of exogenous variables X look like?
� Apply t- or F -tests to check parameter constancy, e.g.
H0 : = �1 = �2 = 0
� Often, the models just include a level e¤ect, i.e.
yt = �+ Dt + �1x1t + �2x2t + ut
(use a t-test for )
20
� If the qualitative exogenous variable has more than two values,we need more than one dummy variable
� Example: Religion (protestant, catholic, other)
Dprott =
8><>:0 if other1 if protestant0 if catholic
Dcatht =
8><>:0 if other0 if protestant1 if catholic
� Interpretation of the coe¤cients?
21
� If there are two or more qualitative exogenous variables,interaction terms can be added
� Example: Gender and citizenship
D1t =
(0 if male1 if female
D2t =
(0 if German citizenship1 else
� Interpretation of the coe¢ cients 1; 2; � in the two models
yt = �+ 1D1t + 2D2t + �xt + ut
yt = �+ 1D1t + 2D2t + �D1tD2t + �xt + ut
22
� What happens if there are two dummy variables
Dfemalet =
(0 if male1 if female
Dmalet =
(1 if male0 if female
� What happens if the dummy variable is coded as
Dt =
(1 if male2 if female
23
� Compare the joint dummy variable model
yt = �+Dt + �1x1t + �1Dtx1t + �2x2t + �2Dtx2t + ut
with the two separated models
yt = �M + �M1x1t + �M2x2t + ut for menyt = �F + �F1x1t + �F2x2t + ut for women
[dummycomparison.R]
� Questions:1. Why are the point estimates identical? [1]
2. Why is the sum of squared residuals identical? [2]
3. Why are the standard errors di¤erent? [3]
24
Heteroskedasticity
� Assumption B2: V ar(ut) = �2 for t = 1; : : : ; T
� Rent example: The rent yt depends on the distance xt from the city center
t xt yt t xt yt1 0,50 16,80 7 3,10 12,802 1,40 16,20 8 4,40 12,203 1,10 15,90 9 3,70 15,004 2,20 15,40 10 3,00 13,605 1,30 16,40 11 3,50 14,106 3,20 13,20 12 4,10 13,30
25
� The scatterplot suggests that there might be heteroskedasticity:
� What are the properties of � if there is heteroskedasticity? [4]
26
Transformation of the model
� (Restrictive and arbitrary) assumption:
�2t = �2xt
� Transformation of the model:ytpxt
= �1pxt+ �
xtpxt+
utpxt| {z }
error termy�t = �z�t + �x
�t + u
�t
� Properties of the new error term u�t [5]
27
� The transformed model satis�es all A-, B- and C-assumptions!
� OLS estimation of the transformed model:
b�� =Sz�y�
Sz�z�b�� =Sx�y�
Sx�x�
=
P(x�t � x�) (y�t � y)P�
x�t � x��2
=
P 1xt(xt � x) (yt � y)P 1xt(xt � x)2
28
� The usual estimators
� =
P(xt � x) (yt � y)P
(xt � x)2
� = �y � ��x
are ine¢ cient
� An unbiased estimator of
V ar(u�t ) = �2
is
�2 =Su�u�
T � 2
29
� From �2t = �2xt we conclude that
�2t = �2 � xt
is an unbiased estimator of V ar(ut)
� It can be shown that [6]
V ar(�) =
P(xt � x)2 �2tS2xx
� The usual equations
V ar(�) =�2
Sxxand �2 =
SuuT � 2
are wrong under heteroskedasticity
30
Goldfeld-Quandt test
� Step 1: Re-order the observations according to their xt-values(or some other �source of heteroskedasticity�)
� Step 2: De�ne two groups:
� T1 observations with low xt-values;
� T2 observations with high xt-values
Often, T1 + T2 = T
31
� Step 3: We assume �22 > �21; hence
H0 : �22 = �21
H1 : �22 > �21
� Step 4: Separate OLS estimation for both groups; compute S1uuand S2
uu
� Step 5: Goldfeld and Quandt (1972) show that ander H0
F =S2uu= (T2 �K � 1)
S1uu= (T1 �K � 1)
follows an F(T2�K�1;T1�K�1)-distribution
� Step 6: Compare F to the critical level Fa. If F > Fa; reject H032
Numeric illustration: rentexample.R
1. Order the observations according to their xt-values
2. Group Z: City center (TZ = 5)Group P: Periphery (TP = 7)
3. Null hypothesis: H0 : �2P � �2Z
4. Sums of squared residuals
SZuu = 0:246 and SPuu = 4:666
33
5. Hence,
F =4:666=5
0:246=3= 11:4
6. At level a = 5% the critical value is 9.01. Reject the null hypothesis. The dataindicate heteroscedasticity.
The null hypothesis that the error term variance is the same in the center and theperiphery, is rejected at the 5% level.
Heteroscedasticity should be taken into account.
34
White test
� Consider the linear regression model with two exogenous variables
yt = �+ �1x1t + �2x2t + ut
� Step 1: H0: the error terms are homoskedastic
� Step 2: Calculate the OLS residuals ut
� Step 3: Estimate the auxilliary regression
u2t = 0 + 1x1t + 2x2t +
3x21t + 4x
22t + 5x1tx2t + vt
35
� Step 4: It can be shown that under H0
R2 � T � �2rwhere r is the number of slope parameters in the auxilliary regression
� If T �R2 is larger than the critical value of the �2r-distribution, reject H0
� The squared residuals can be explained (at least partially) by the exogenousvariables
� Illustration [rentexample.R]
36
� Question: Given that heteroskedasticity has been detected, how shall weproceed?
� Answer 1: Adjust the estimation procedure
�! GLS or feasible GLS
� Answer 2: Still use OLS but compute the correct standard errors
�! White�s heteroskedasticiy-consistent covariance maxtrix estimator
37
Generalized least squares method (GLS)
� Verallgemeinerte Kleinste-Quadrate-Methode (VKQ)
� Regression model y = X� + u
� Covariance matrix of the error terms V(u) 6= �2I, but V(u) = �2
� Example: �2t = �2xkt; then
=
264 xk1 : : : 0... . . . ...0 : : : xkT
37538
� Transformation of the model: Since is positive de�nit, there is a(T � T )-matrix P with
P0P = �1
� Example: If
=
264 xk1 : : : 0... . . . ...0 : : : xkT
375 ;then
P =
264 1=pxk1 : : : 0
... . . . ...0 : : : 1=
pxkT
375
39
� From P0P = �1 it follows that
PP0 = IT
� Pre-multiplication of y = X� + u by P yields
Py = PX� +Pu
y� = X�� + u�
� Properties of u� [7]
� The transformed model satis�es all A-, B- and C-assumptions
40
� Derivation of the GLS estimator �V KQ [8]
� Covariance matrices of �V KQ and � [9]
� Estimation of �2 by
�2 =u�0u�
T �K � 1=
u0�1uT �K � 1
� Ignoring heteroskedasticity one would use
V(�) = �2(X0X)�1
�2 =u0u
T �K � 1
41
� Interval estimators and hypothesis tests would not work correctly
� What happens if is unknown?
� Example:
W = �2 =
2666666664
�2I 0 : : : : : : : : : 00 . . . ...... �2I
...... �2II
...... . . . 00 : : : : : : : : : 0 �2II
3777777775
42
� Feasible Generalized Least Squares (FGLS),Geschätzte verallgemeinerte Kleinste-Quadrate (GVKQ)
� First, estimate the unknown quantities inW = �2
� The FGLS estimator is �FGLS =�X0W�1X
��1X0W�1y
� Estimated covariance matrix V(�FGLS) =�X0W�1X
��1
� What to do if there is no information at all about the form ofheteroskedasticity?
43
White�s heteroskedasticiy-consistent covariance maxtrix
estimator
� Davidson and MacKinnon, chap. 5.5
� Econometric model y = X� + u
� Covariance matrix V (u) =W withW =diag��21; : : : ; �
2T
�� OLS estimator
� = (X0X)�1X0y
� Covariance matrix V(�) =�X0X
��1X0WX0�X0X
��144
� Consistent estimation ofW is impossible
� White (1980): Consistent estimation of
� =1
TX0WX
=1
T
TXt=1
�2ixix0i
is possible!
� Consistent estimator of �
� =1
T
TXt=1
u2ixix0i
45
� Estimated covariance matrix
V(�) = (X0X)�1X0WX(X0X)�1
with
W =
264 u21 . . .u2T
375
� Sandwich estimator
� Illustration [rentexample.R]
46
Autocorrelation
� Assumption B3: The error terms are uncorrelated,
Cov(ut; us) = 0
for all t 6= s
� Example [water�lter.R]: Demand function
yt = �+ �xt + ut
for water �lters; quantity sold yt and prices xt for the months January 2001 toDecember 2002
47
� Assumption about the form of autocorrelation:
ut = �ut�1 + et
with �1 < � < 1
� Assumption about etet � NID(0; �2e)
� Properties of ut [10]
48
� Moment functions of ut
E(ut) = 0
V ar(ut) =�2e
1� �2
Cov(ut; ut�1) = �
�2e
1� �2
!
Cov(ut; ut�j) = �j
�2e1� �2
!
� B1, B2 and B4 are still satis�ed
� But B3 is violated
49
� Transformation of the model [11]
yt � �yt�1 = (1� �)�+ � (xt � �xt�1) + et
� De�ne
y�t = yt � �yt�1�� = (1� �)�x�t = xt � �xt�1
� Then
y�t = �� + �x�t + et
satis�es all A-, B- and C-assumptions (if � would be known)
50
� Hence, OLS estimation is ine¢ cient
� Consequences for interval estimation and hypothesis tests?
� The usual OLS formulas
V ar(�) =�2
Sxx
and
�2 =SuuT � 2
are invalid
� Consequences are the same as in the case of heteroskedasticity51
Diagnosis
� Plot the residuals ut over time, or plot the pairs (ut�1; ut)
� Example (demand function)
� Estimator for �: Because of ut = �ut�1 + et we can estimate � by theregression
but = �but�1 + e�t
52
� Least squares estimator
� =
PTt=2 butbut�1PTt=2 bu2t�1
� Numeric illustration: From the residuals we calculate
� =1481594
2557515= 0:58
� Due to the two-step approach the ordinary t-test is no longer exact
53
Durbin-Watson test
� Step 1: Set up the hypotheses
H0 : � � 0H1 : � > 0
� Step 2: Compute the Durbin-Watson test statistic
d =
PTt=2 (but � but�1)2PT
t=1 bu2t
54
� Numeric illustration:
d =2101281
2761231= 0:76
� Relation between d and � [12]
d � 2(1� �)
� Step 3: Find the critical value da (using econometric software). If d < da,reject H0
55
� Problem: The critical value da depends on X
� If the software cannot compute da there are tables providing an upperboandary dHa and a lower boandary dLa for da
� Step 4: Compare the test statistic d to dLa and dHa
� Decision rule:
� if d < dL0;05, reject H0 : � � 0;� if d > dH0;05, do not reject H0 : � � 0;� if dL0;05 � d � dH0;05, leave the decision open
56
� Numeric illustration: For K = 1 and T = 24 Table T5 gives
dL0:05 = 1:27 and dL0:05 = 1:45
Since d = 0:76 < dL0:05, reject the null hypothesis that the residuals arepositively correlated
� Disadvantages of the Durbin-Watson test:
�no decision in some cases� lagged endogenous variables are not allowed�only applicable for AR(1)-processes
� Alternative tests for autocorrelation are available in many software packages57
GLS and autocorrelation
� Regression model
y = X� + u
� Covariance matrix V(u) = �2 with
=
2666641 � : : : �T�1
� 1 : : : �T�2... ... . . . ...
�T�1 �T�2 : : : 1
377775
58
� Transformation of the model using the matrix P satisfying P0P = �1
� One can verify that
P =
26666664
q1� �2 0 0 : : : 0
�� 1 0 : : : 00 �� 1 : : : 0... . . . . . . . . . ...0 : : : 0 �� 1
37777775
� The GLS estimator is the same as in the case of heteroskedasticity
59
� GLS estimator
�GLS
=�X0�1X
��1X0�1y
with covariance matrix
V(�GLS
) = �2�X0�1X
��1
� Estimator of the error term variance
�2 =u0�1uT �K � 1
� GLS is not possible as � (and hence P) is unknown
60
� Hildreth-Lu approach:
Lege für � ein feines Gitter über [�1; 1]; wähle das � mit dem kleinsten Wertvon �2
� Cochrane-Orcutt-Verfahren:
Schätze � aus den KQ-Residuen, dann GVKQ mit �; anschließend Iterationen
61
Heteroskedasticiy and autocorrelation consistent covariance
maxtrix estimation
� Newey W.K. and West, K.D. (1987), A Simple Positive De�nite,Heteroskedasticity and Autocorrelation Consistent Covariance Matrix,Econometrica, 55: 703-708.
� Econometric model y = X� + u
� Covariance matrix
V (u) =W
with arbitrary covariance matrixW
62
� OLS estimator
� = (X0X)�1X0y
� Covariance matrix of � (as before)
V(�) = (X0X)�1X0WX(X0X)�1
� The matrixW cannot be estimated consistently
� But 1TX0WX can estimated consistently
63
� Consistent estimation of V(�) byTXt=1
u2txtx0t +
qXb=1
1� b
q + 1
!Ab
where q is the number of autocorrelations to be taken into account
� The matrices
Ab =TX
t=b+1
�xtutut�bx
0t�b + xt�but�butx
0t
�are estimators of autocorrelation matrices
� The White estimator is a special case of V(�)(if Ab = 0 for all b)
64
Nonnormal error terms
� Assumption B4: The error terms are normally distributed
� This assumption is necessary
� to derive the normality of ��to derive the t-distribution of the t-statistic� to derive the F -distribution of the F -statistic
� Remember that � is a linear estimator
� =�X0X
��1X0y
= Cy
65
� For a single component of � we �nd
�k =TXt=1
cktyt
� The random variables y1; : : : ; yT are stochastically independent
� Hence, �k is the sum of independent (but not identically distributed) randomvariables
� Question: How is the sum of random variables distributed?
66
� Central limit theorem: The sum of many i.i.d. random variables isapproximately normally distributed
� The central limit theorem does also hold for nonidentical distribution:�k is approximately normally distributed, even if the error terms are nonnormal
� Further: � is approximately multivariate normal,
�appr� N
��; �2
�X0X
��1�
� Careful: There are some (weak) regularity conditions that must be satis�ed;normality can break down (but usually does not)
67
Simulation [b4.R]:
� Gratuituy example (from last semester):
yt = 0:5 + 0:1 � xt + utsatisfying all A-, B-, C-assumptions apart from B4
� Distribution of error terms: fut (u) = exp (� (u+ 1))
68
� Since
�appr� N
��; �2
�X0X
��1�we �nd for single components �k of �
�k � �kSE(�k)
d�! U � N(0; 1)
for k = 1; : : : ;K
� Con�dence intervals and t-tests are asymptotically valid(use quantiles of N(0; 1) instead of t-distribution)
� F -tests are asymptotically valid (convergence to �2-distribution)69
Stochastic convergence and limit theorems
� Comvergence of real sequences: Let a1; a2; : : : be a sequence of real numbers
� De�nition: The sequence fangn2N converges to its limit a, if for any(arbitrarily small) " > 0 there is a number N(") such that jan � aj < " for alln � N(")
� Notation: limn!1 an = a or an ! a
� Examples:
limn!1 1=n = 0limn!1
h�n2 + n+ 6
�=�3n2 � 2n+ 2
�i= 1=3
70
� Graph of the convergent sequence (n2 + n+ 6)=(3n2 � 2n+ 2):
71
Questions:
� How can the idea of convergence be transferred to sequences of randomvariables?
� What is a sequence of random variables?
� What does convergence of sequences of random variables mean?
� Which sequences of random variables do we typically encounter ineconometrics?
72
� De�nition: Let X1; X2; : : : be random variables
Xi : ! R
We call X1; X2; : : : a sequences of random variables
� X1; X2; : : : are (countably in�nitely many) multivariate random variables
� Formally, this is a sequences of functions(not of real variables)
73
� De�nition: The sequence X1; X2; : : : converges almost surely (fast sicher) to arandom variable X, if
P�n! : lim
n!1Xn(!) = X(!)o�= 1
� Notation
Xnf:s:! X
Xna:s:! X
� This kind of convergence is only of minor importance in econometrics
74
� De�nition: The sequences X1; X2; : : : converges in probability (nachWahrscheinlichkeit) to a random variables X, if
limn!1P (jXn �Xj < ") = 1
� Notation
Xnp! X
plimXn = X
� This kind of convergence is very important in econometrics
75
� Special case: Convergence in probability to a constant
� The sequence X1; X2; : : : converges in probability (nach Wahrscheinlichkeit) toa constant a, if
limn!1P (jXn � aj < ") = 1
� Notation
Xnp! a
plimXn = a
� In econometrics we usually need this kind of convergence in probability76
� De�nition: The sequence X1; X2; : : : (with distribution functions F1; F2; : : :)converges in distribution, in law (nach Verteilung) to a random variable X(with distribution function F ), if
limn!1Fn(x) = F (x)
for all x 2 R where F (x) is continuous
� Notation
Xnd! X
� Relation between types of convergence
Xnf:s:! X ) Xn
p! X ) Xnd! X
77
� Limit theorems: laws of large numbers (LLN, Gesetze der großen Zahl); centrallimit theorems (CLT, zentrale Grenzwertsätze)
� Let X1; X2; : : : be a sequence of random variables
� De�ne a new sequence �X1; �X2; : : : where
�Xn =1
n
nXi=1
Xi
� De�ne another new sequence Z1; Z2; : : : where
Zn =Sn � E(Sn)qV ar(Sn)
with Sn =nXi=1
Xi
78
Strong law of large numbers (SLLN)
� Let X1; X2; : : : be a sequence of independent random variables with�i = E(Xi) <1 and V ar(Xi) <1 for i = 1; 2; : : :
� If P1k=1 V ar(Xk)=k2 <1; thenP
0@ limn!1
0@ �Xn � 1
n
nXi=1
�i
1A = 01A = 1
� Special case: iid sequences, �Xnf:s:! �
79
Weak law of large numbers (Chebyshev, WLLN)
� Let X1; X2; : : : be a sequence of independent random variables with�i = E(Xi) <1 and V ar(Xi) < c <1
� Then
limn!1P
0@������ �Xn � 1
n
nXi=1
�i
������ < "1A = 1
� Special case: iid sequences, plim �Xn = �
80
Weak law of large numbers (Khinchin)
� Let X1; X2; : : : be a sequence of iid random variables with E(Xi) = �
� Then
limn!1P
���� �Xn � ���� < "� = 1� There are also laws of large numbers for stochastic processes, e.g. formartingale di¤erence sequences
81
� The weak laws of large numbers can easily be generalized to the multivariatecase, e.g. Khinchin:
� Let X1;X2; : : : be a sequences of iid random vectors with E(Xi) = �
� For each component k = 1; : : : ;K
limn!1P
���� �Xnk � �k��� < "� = 1� Notation
plim �Xn = �
82
Central limit theorem
� Let X1; X2; : : : be a sequence of random variables
� Consider the sequence of standardized cumulative sums
Zn =Sn � E(Sn)qV ar(Sn)
with Sn =nXi=1
Xi
� How is Zn distributed for n!1 ?
� Impose only a few assumptions about the distribution of the Xis83
Central limit theorem (Lindeberg-Levy)
� Let X1; X2; : : : be a sequence of iid random variables with E(Xi) = � andV ar(Xi) = �
2 <1
� Let Fn(z) = P (Zn � z) denote the distribution function of Zn
� Then
limn!1Fn(z) =
Z z�1
1p2�exp
��12u2�du
� Convergence in distribution: Zn d! Z � N(0; 1)84
Central limit theorem (Liapunov)
� Let X1; X2; : : : be a sequence of independent random variables withE(Xi) = �i, V ar(Xi) = �
2i <1, and E(jXij2+�) <1 for (arbitrarily
small) � > 0
� De�ne cn =qPn
i=1 �2i
� If
limn!1
0@ 1
c2+�n
nXi=1
E (jXi � �ij)2+�1A = 0;
then Znd! Z � N(0; 1)
85
� The heart of the central limit theorem: no single random variable mustdominate the sum
� Each (Xi � �i)=�i is only a negligibly small contribution to the sum(Sn � E(Sn))=cn
� Frequent notation (in the iid case)
Snappr� N(n�; n�2)
�Xnappr� N(�; �2=n)
� We can deal with the sum as if it is normally distributed(if n is large enough)
86
� The central limit theorem also applies to empirical moments!
� Let �k = E(Xk) denote the k-th (theoretical) moment of X
� The k-th empirical moment
mk =1
n
nXi=1
Xki
is an estimator for �k
� According to the CLT, mk is asymptotically normal if the variance of Xk exists(i.e., the 2k-th moment �2k)
87
� The central limit theorem can easily be generalized to the multivariate case,e.g. Lindeberg-Levy:
� Let X1;X2; : : : be a sequence of iid random vectors with E(Xi) = � andCov(Xi) = �
� Thenpn��Xn � �
�d! Z � N(0;�)
� Remark: In the univariate case we can also writepn( �Xn � �) d! Z � N(0; �2)
88
Further central limit theorems
� The assumptions about the sequence X1; X2; : : : can be weakened
� Central limit theorems for stochastic processes
� Central limit theorems for products of random variables
� Central limit theorems for maxima (extreme value theory)
89
Useful rules of calculus
� If plimXn = a and plimYn = b, then
plim(Xn � Yn) = a� bplim(XnYn) = ab
plim�Xn
Yn
�=
a
b; if b 6= 0
� If a function g is continuous at a, then
plim g (Xn) = g (a)
90
� If Yn d! Z and h is a continuous function, then
h (Yn)d! h (Z)
� Cramér�s theorem: If Xnp! a and Yn
d! Z, then
Xn + Ynd! a+ Z
XnYnd! aZ
� Cramér�s theorem is very useful if there are unknown parameters in theasymptotic distribution that can be estimated consistently(more on consistency later)
91
Example for Cramér�s theorem:
� Let X1; : : : ; Xn be a random sample from X; we know that
S�2n =1
n� 1
nXi=1
�Xi � �X
�2 p! �2
S2n =1
n
nXi=1
�Xi � �X
�2 p! �2
� Hence�
S�n
p! 1 and�
Sn
p! 1
92
� According to the central limit theorem
pn�Xn � ��
d! Z � N (0; 1)
� Due topn�Xn � �Sn
=pn�Xn � ��
� �Sn
and �=Snp! 1 we have
pn�Xn � �Sn
d! Z � 1 = Z � N (0; 1)
� Similarly forpn( �Xn � �)=S�n
93
� Multivariate version: According to the central limit theorempn��Xn � �
�d! Z � N (0;�)
� Due to
�n =1
n
X�Xi � �Xn
� �Xi � �Xn
�0 p! �
we can use the following approximation for large n;
�Xnappr� N
��; �n
�(Careful: the notation is bad, but it helps the intuition)
94
Stochastic exogenous variables
� Assumption C1: The matrix X ist non-stochastic
� What happens if X is (at least partially) stochastic?
� We distinguish three cases:
1. X and u are stochastically independent
2. Contemporaneous uncorrelatedness: Cov(xkt; ut) = 0 for all t; k
3. X and u are contemporaneously correlated
95
Conditional expectation
� Let (X;Y ) be jointly continuous with density function fX;Y (x; y)
� Marginal distributions (marginal densities)
fX(x) =Z 1�1
fX;Y (x; y)dy
fY (y) =Z 1�1
fX;Y (x; y)dx
� Conditional density of X given Y = y
fXjY=y (x) =fX;Y (x; y)
fY (y)
96
� Conditional expectation (bedingter Erwartungswert) of X given Y = y
E (XjY = y) =Z 1�1
xfXjY=y (x) dx
� Conditional expectation (bedingte Erwartung) of X given Y :
E (XjY )
is a random variable realizing as E (XjY = y) if Y = y
� The conditional expectation E(XjY = y) is a real number (for given y)
� The conditional expectation E(XjY ) is a random variable
97
Useful rules for conditional expectations
1. Law of iterated expectations: E (E (XjY )) = E (X)
2. Independence: If X and Y are independent, then E (XjY ) = E (X)
3. Linearity: For a1; a2 2 R,
E (a1X1 + a2X2jY ) = a1E (X1jY ) + a2E (X2jY )
4. The conditioned random variables can be treated like constants,
E (f (X) g (Y ) jY ) = g (Y )E (f (X) jY )
98
Stochastic exogenous variables, case 1
� Model y = X� + u with X and u stochastically independent
� The estimators � and �2 are unbiased and consistent
� (Estimated) covariance matrix of �
� Asymptotic normality:pT (� � �) � N(0; �2uQ�1XX)
� Conclusion: If X and u are independent there are no problems
99
Stochastic exogenous variables, case 2
� The error term and the exogenous variables are contemporaneouslyuncorrelated (but may be correlated over time)
� Typical case: lagged endogenous variables on the right hand side
� Unbiasedness is lost
� Consistency and asymptotic normality still hold
� Conclusion: If there is contemporaneous uncorrelatedness, there are hardly anyproblems if the sample is large enough
100
Stochastic exogenous variables, case 3
� Contemporaneous correlation between error terms and exogenous variables
� Example:
101
Why might there be contemporaneous correlation?
� Errors-in-variables:Model: yt = �+ �x�t + etMeasurement: xt = x�t + vt
� Simultaneous equation systems:
ct = �+ �yt + ut
yt = ct + it
102
Instrumental variables (IV estimation)
� Model
y = X� + u
with contemporaneous correlation between X and u
� Instrumental variables: contemporaneously uncorrelated with u, but correlatedwith X
� Let Z denote the (T � (L+ 1))-matrix of instruments, and
P = Z�Z0Z
��1Z0
103
� The matrix P is symmetric and idempotent, P0P = P
� Number of columns L � K (often L = K)
� Transformed model
Py = PX� +Pu
� The least squares estimators of the transformed model are called IV estimators
�IV
=�X0P0PX
��1X0P0Py
=�X0PX
��1X0Py
104
� If L = K then
�IV
= (X0Z(Z0Z)�1Z0X)�1X0Z(Z0Z)�1Z0y
= (Z0X)�1(Z0Z)(X0Z)�1X0Z(Z0Z)�1Z0y
= (Z0X)�1Z0y
� Simple linear regression (L = K = 1)
�IV=
P(zt � �z) (yt � �y)P(zt � �z) (xt � �x)
105
Assumptions about Z
� Existing limit,
plimZ0ZT
= limT!1
E
Z0ZT
!= QZZ
with QZZ positive de�nite
� Asymptotic correlation with exogenous variables
plimZ0XT
= QZX ; rang(QZX) = K + 1
106
� Asymptotic uncorrelatedness with error terms
plimZ0uT
= limT!1
E
Z0uT
!= 0
107
� IV estimators are consistent but not unbiased
� Hausman test (Hausman-Wu test): Hypotheses
H0 : plimX0uT
= 0
H1 : plimX0uT
6= 0
� Test idea: Under H0 both OLS and IV are consistent, under H1 only IV isconsistent
� If �IV deviates �too much� from �, reject H0
108
� Test statistic: ��IV � �
�0 �V(�
IV)� V(�)
��1 ��IV � �
�
� Asymptotic distribution under H0 is �2K�, where K� is the number of columns
in Z that are not included in X
109
Multicollinearity
� Perfect vs imperfect multicollinearity
� Graphical illustration
110
Dynamic models
� Stochastic process: x1; : : : ; xT
� Moment functions: E(xt), V ar(xt), Cov(xt; xt+�)
� (Weak) stationarity
E(xt) = �
V ar(xt) = �2xCov(xt; xt+�) = �
� Order of integration of a process, I(d)111
� Simplest dynamic model: lagged exogenous variables
yt = �+ �0xt + �1xt�1 + : : :+ �Kxt�K + vt
� Interpretation of the parameters (short-term and long-term multiplier)
� Problems:
� many parameters
� Multicollinearity
� no precise estimation of individual components �k
112
� Note: The variance of the long-term multiplier may be small even if allcomponents �k have a large variance
� Functional form for �0; �1; : : : ; �K
� Polynomial lags (Almon lags)
� geometric lags (Koyck lags)
113
Polynomial lags
� The �k are a polynomial function of k
� Example: Quadratic function:
�k = �0 + �1k + �2k2
for k = 0; : : : ;K
114
� There are less than K parameters, since
yt = �+KXk=0
�kxt�k + vt
= �+KXk=0
��0 + �1k + �2k
2�xt�k + vt
= �+ �0
KXk=0
xt�k + �1KXk=0
kxt�k
+�2
KXk=0
k2xt�k + vt
= �+ �0x�1t + �1x
�2t + �2x
�3t + vt
� The validity of the linear restrictions can be tested115
Geometric lags
� The �k depend on k as follows,
�k = �0�k
where 0 < � < 1
� It is possible to set K =1
yt = �+ �0xt + �1xt�1 + �2xt�2 + : : :+ vt= �+ �0xt + �0�xt�1 + �0�
2xt�2 + : : :+ vt
116
� Short-term multiplier: �0
� Long-term multiplier:
1Xk=0
�k = �0
1Xk=0
�k
= �01
1� �
117
� Koyck transformation:
yt = �+ �0xt + �0�xt�1 + �0�2xt�2 + : : :+ vt
minus
�yt�1 = ��+ �0�xt�1 + �0�2xt�2 + : : :+ �vt�1
yields
yt � �yt�1 = (�� ��) + �0xt + (vt � �vt�1)yt = �0 + �0xt + �yt�1 + ut
� Estimation problematic since B3 and C1 are violated
118
� Models with rational lag distribution
yt = �0 + �0xt + �1xt�1 + : : :+ �Kxt�K+�1yt�1 + : : :+ �Myt�M + ut
� Special case K =M = 1
yt = �0 + �0xt + �xt�1 + �yt�1 + ut
� From
yt = �0 + �0xt + �xt�1 + �yt�1 + ut
we �nd
yt � �yt�1 = �0 + �0xt + �xt�1 + ut
119
� Long-term (undisturbed) equilibrium
y� =�01� �
+�0 + �
1� �x�
� Error correction formulation
�yt = �0�xt � (1� �) et�1 + utwith error (disequilibrium) term
et�1 = yt�1 ��01� �
+�0 + �
1� �xt�1
� If xt and yt are both I (1), and if et�1 is I(0), then xt and yt are calledcointegrated
120
Estimation of error correction models (ECM)
1. Determine the order of integration of xt and yt
2. Estimate by OLS
yt�1 =�01� �
� �0 + �1� �
xt�1 + et�1
and calculate the residuals et�1
3. Determine the order of integration of et�1
4. If there is cointegration, estimate
�yt = �0�xt � (1� �) et�1 + ut
121
Interdependent equation systems
� Illustration by a simple example
� Pharmacy company: Advertisement expenditures wt, quantity sold at, price pt,advertising price (per page) qt
� Model equations
at = �+ �1wt + �2pt + utwt = + �1at + �2qt + vt
� Error terms satisfy all B-assumptions; further we assumeCov (ut; vt) = �uv and Cov (us; vt) = 0 for s 6= t
122
� In the �rst equation, ut and wt are correlated!
� Hence the OLS estimators are inconsistent
� Structural form vs reduced form
� From the structural form
at = �+ �1wt + �2pt + utwt = + �1at + �2qt + vt
we derive the reduced form �at = �1 + �2pt + �3qt + u
�t
wt = �4 + �5pt + �6qt + v�t
123
� Reduced form: all endogenous variables are on the left hand side, all exogenousare on the right hand side
� The equations of the reduced form can be estimated by the OLS method.
� From the estimated values �1; : : : ; �6 one obtains the following values�; �1; �2; ; �1; �2 �
� The estimators �; �1; �2; ; �1; �2 are consistent
� It is not always possible to derive the structural parameters from the reducedform parameters (identi�cation problem)
124
� From the structural form
at = �+ �1wt + �2pt + ut
wt = + �1at + vt
one obtains the reduced form
at = �1 + �2pt + u�t
wt = �3 + �4pt + v�t
� Five structural parameters but only for reduced parameters
� Sometimes there are more reduced parameters than structural parameters
125
� Condition of indeterminacy
_K = Number of the exogenous variables in the general model
K� = Number of the exogenous variables in the considered equation
M� = Number of the endogenous variables in the considered equation
� An equation is
underidenti�ed, if M� � 1 > _K �K�
exactly identi�ed, if M� � 1 = _K �K�
overidenti�ed, if M� � 1 < _K �K�
� M� � 1 is the number of the explanatory endogneous variables (on the rightside); _K �K� is the number of the exogenous variables in the other equations
126
� Estimation of an exactly or overidenti�ed equation
� Two-stages LS method (2SLS)
� Idea: obtain instrumental variables from the reduced form
� Example for the 2SLS method
� In the system
at = �+ �1wt + �2pt + utwt = + �1at + �2qt + vt
The second equation has to be estimated
127
� First step: estimate by the LS
at = �1 + �2pt + �3qt + u�t
and obtains at = �1 + �2pt + �3qt
� Second step: estimate by the LS
wt = + �1at + �2qt + vt
� The 2SLS estimators are consistent (IV-estimator)
� The standard errors have to be adjusted
� The properties of the estimators in �nite samples are complicated128
Interdipendent euquationsystems in matrix notation
� General representation
� Let M the number of equations in the system
� The endogenous variables are set in a (T �M)-Matrix
Y = [y1 y2 : : : yM ]
� The exogenous variables (and the intercept) are set in a (T � _K)-Matrix
X = [x0 x1 : : : xK]
129
� The m-th equation is
ym = �mx0 + �1mx1 + �2mx2 + : : :+ �KmxK+ 1my1 + : : :+ m�1mym�1 + m+1mym+1 + : : :+ MmyM+um
� Setting mm = �1, yields
1my1 + : : :+ MmyM + �mx0 + �1mx1 + : : :+ �KmxK + um = 0
� Pile the coe¢ cients in vectors
m = ( 1m; 2m; : : : ; Mm)0
�m = (�m; �1m; �2m; : : : ; �Km)0
130
� Compact notation of the complete system
Y 1 +X�1 + u1 = 0
Y 2 +X�2 + u2 = 0...
Y M +X�M + uM = 0
and accordingly
Y�+XB+U = 0
with dimensions!
� = [ 1 : : : M ]
B = [�1 : : : �K]
U = [u1 : : : uM ]
131
� The noise terms um, m = 1; : : : ;M , satisfy all B-assumptions
� Dependencies between noise terms of di¤erent equations are permitted
� Assumption
E(umu0m) = �2mIT für m = 1; : : : ;M
E(umu0n) = �mnIT für m 6= n
� How could one write these asumptions in a compact notation for the matrix U?
132
� Reduced form (all endogenous variables on the left side and all exogenous oneson the right side)
� From
Y�+XB+U = 0
follows
Y���1 +XB��1 +U��1 = 0
and accordingly
Y = X�+V
with � = �B��1 and V = �U��1
133
� The structural coe¢ cients in � and B are identi�able only when their valuescan be distinctly deduced from �
� Number of the coe¢ cients:
� : _KM
� : M2 �MB : _KM
� So one needs (at least) M2 �M appropriate restrictions in � and/or B
� In what follows zero-restrictions are assumed �134
Estimations of interdependent equationsystems
� Reduced form
y1 = X�1 + v1...
yM = X�M + vM
� LS-estimation of one equation
�m = (X0X)�1X0ym
� LS-estimation of all equations
� = (X0X)�1X0Y
135
� ILS-method: if equation m is exactly identi�ed, one can derive the estimatorsof the structural coe¢ cients from the matrix �
� If equation m is exactly or overidenti�ed, one uses the 2SLS-method
� Resort and matrix partition
hym �Ym �Ym
i 264 �1� m0
375+X�m + um = 0
� �Ym: in equation m included endogenous variables;�Ym: excluded variables
136
� Equation m can be rewritten in this way
ym = �Ym� m +X�m + um
=h�Ym X
i " � m�m
#+ um
� First step: estimate
� = (X0X)�1X0Y
and accordingly partition��m
c��m c��m� = (X0X)�1X0 hym �Ym �Ymi
137
� The system of the endogenous variables are estimated by
c�Ym = Xc��m� Second step: substitute �Ym through c�Ym in
ym =h�Ym X
i " � m�m
#+ um
� 2SLS estimator24 b� ZSKQm
�ZSKQm
35 = "�c�Ym X�0 �c�Ym X
�#�1 �c�Ym X�0ym
138
� The covariance matrix of the estimated vector24 b� ZSKQm
�ZSKQm
35is
�2"�c�Ym X
�0 �c�Ym X�#�1
with
�2 =1
T
TXt=1
0@ym � h�Ym X
i 24 b� ZSKQm
�ZSKQm
351A2
and NOT
�2 =1
T
TXt=1
0@ym � �c�Ym X� 24 b� ZSKQm
�ZSKQm
351A2
139
top related