econometrics - ssc.wisc.edussc.wisc.edu/~bhansen/econometrics/econometrics.pdf · econometrics...

ECONOMETRICSBRUCE E. HANSEN

©2000, 20201

University of Wisconsin

Department of Economics

This Revision: January, 2020Comments Welcome

1This manuscript may be printed and reproduced for individual or instructional use, but may not be printed forcommercial purposes.

Contents

Preface xv

About the Author xvi

1 Introduction 11.1 What is Econometrics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The Probability Approach to Econometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Econometric Terms and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Observational Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Standard Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6 Econometric Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.7 Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.8 Data Files for Textbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.9 Reading the Manuscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.10 Common Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

I Regression 13

2 Conditional Expectation and Projection 142.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 The Distribution of Wages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4 Log Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.5 Conditional Expectation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.6 Continuous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.7 Law of Iterated Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.8 CEF Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.9 Intercept-Only Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.10 Regression Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.11 Best Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.12 Conditional Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.13 Homoskedasticity and Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.14 Regression Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.15 Linear CEF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.16 Linear CEF with Nonlinear Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.17 Linear CEF with Dummy Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.18 Best Linear Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.19 Illustrations of Best Linear Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

i

CONTENTS ii

2.20 Linear Predictor Error Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.21 Regression Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.22 Regression Sub-Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.23 Coefficient Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.24 Omitted Variable Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.25 Best Linear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.26 Regression to the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.27 Reverse Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.28 Limitations of the Best Linear Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.29 Random Coefficient Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.30 Causal Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.31 Existence and Uniqueness of the Conditional Expectation* . . . . . . . . . . . . . . . . . . 562.32 Identification* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562.33 Technical Proofs* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3 The Algebra of Least Squares 643.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.2 Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.3 Moment Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.4 Least Squares Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.5 Solving for Least Squares with One Regressor . . . . . . . . . . . . . . . . . . . . . . . . . . 673.6 Solving for Least Squares with Multiple Regressors . . . . . . . . . . . . . . . . . . . . . . . 693.7 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.8 Least Squares Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.9 Demeaned Regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743.10 Model in Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753.11 Projection Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763.12 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.13 Estimation of Error Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793.14 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803.15 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803.16 Regression Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813.17 Regression Components (Alternative Derivation)* . . . . . . . . . . . . . . . . . . . . . . . 833.18 Residual Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843.19 Leverage Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853.20 Leave-One-Out Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.21 Influential Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883.22 CPS Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903.23 Numerical Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913.24 Collinearity Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923.25 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4 Least Squares Regression 1014.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.2 Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.3 Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

CONTENTS iii

4.4 Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1034.5 Mean of Least-Squares Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1034.6 Variance of Least Squares Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054.7 Unconditional Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.8 Gauss-Markov Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084.9 Generalized Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.10 Modern Gauss Markov Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104.11 Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114.12 Estimation of Error Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124.13 Mean-Square Forecast Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1134.14 Covariance Matrix Estimation Under Homoskedasticity . . . . . . . . . . . . . . . . . . . 1154.15 Covariance Matrix Estimation Under Heteroskedasticity . . . . . . . . . . . . . . . . . . . 1154.16 Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1194.17 Covariance Matrix Estimation with Sparse Dummy Variables . . . . . . . . . . . . . . . . 1204.18 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1214.19 Measures of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1234.20 Empirical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1244.21 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1254.22 Clustered Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1284.23 Inference with Clustered Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1344.24 At What Level to Cluster? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1354.25 Technical Proofs* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

5 Normal Regression 1445.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1445.2 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1445.3 Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1465.4 Joint Normality and Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1485.5 Normal Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1485.6 Distribution of OLS Coefficient Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1505.7 Distribution of OLS Residual Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1515.8 Distribution of Variance Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1525.9 t-statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1525.10 Confidence Intervals for Regression Coefficients . . . . . . . . . . . . . . . . . . . . . . . . 1535.11 Confidence Intervals for Error Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1555.12 t Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1565.13 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1575.14 Information Bound for Normal Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 159Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

II Large Sample Methods 162

6 A Review of Large Sample Asymptotics 1636.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1636.2 Modes of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1636.3 Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

CONTENTS iv

6.4 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1646.5 Continuous Mapping Theorem and Delta Method . . . . . . . . . . . . . . . . . . . . . . . 1656.6 Smooth Function Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1666.7 Best Unbiased Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1676.8 Stochastic Order Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1686.9 Convergence of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1696.10 Uniform Stochastic Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

7 Asymptotic Theory for Least Squares 1707.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1707.2 Consistency of Least-Squares Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1707.3 Asymptotic Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1727.4 Joint Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1767.5 Consistency of Error Variance Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1787.6 Homoskedastic Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 1797.7 Heteroskedastic Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 1807.8 Summary of Covariance Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1817.9 Alternative Covariance Matrix Estimators* . . . . . . . . . . . . . . . . . . . . . . . . . . . 1827.10 Functions of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1837.11 Best Unbiased Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1857.12 Asymptotic Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1867.13 t-statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1887.14 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1897.15 Regression Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1907.16 Forecast Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1927.17 Wald Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1937.18 Homoskedastic Wald Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1937.19 Confidence Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1947.20 Edgeworth Expansion* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1957.21 Uniformly Consistent Residuals* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1967.22 Asymptotic Leverage* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

8 Restricted Estimation 2068.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2068.2 Constrained Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2078.3 Exclusion Restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2088.4 Finite Sample Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2098.5 Minimum Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2138.6 Asymptotic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2148.7 Variance Estimation and Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2158.8 Efficient Minimum Distance Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2158.9 Exclusion Restriction Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2178.10 Variance and Standard Error Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2188.11 Hausman Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2198.12 Example: Mankiw, Romer and Weil (1992) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2198.13 Misspecification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2248.14 Nonlinear Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

CONTENTS v

8.15 Inequality Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2278.16 Technical Proofs* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

9 Hypothesis Testing 2349.1 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2349.2 Acceptance and Rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2359.3 Type I Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2369.4 t tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2379.5 Type II Error and Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2389.6 Statistical Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2399.7 P-Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2409.8 t-ratios and the Abuse of Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2429.9 Wald Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2439.10 Homoskedastic Wald Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2459.11 Criterion-Based Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2459.12 Minimum Distance Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2469.13 Minimum Distance Tests Under Homoskedasticity . . . . . . . . . . . . . . . . . . . . . . 2479.14 F Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2489.15 Hausman Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2499.16 Score Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2509.17 Problems with Tests of Nonlinear Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . 2529.18 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2559.19 Confidence Intervals by Test Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2579.20 Multiple Tests and Bonferroni Corrections . . . . . . . . . . . . . . . . . . . . . . . . . . . 2589.21 Power and Test Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2599.22 Asymptotic Local Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2619.23 Asymptotic Local Power, Vector Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

10 Resampling Methods 27210.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27210.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27210.3 Jackknife Estimation of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27310.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27610.5 Jackknife for Clustered Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27710.6 The Bootstrap Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27910.7 Bootstrap Variance and Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28010.8 Percentile Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28210.9 The Bootstrap Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28310.10 The Distribution of the Bootstrap Observations . . . . . . . . . . . . . . . . . . . . . . . . 28410.11 The Distribution of the Bootstrap Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . 28510.12 Bootstrap Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28610.13 Consistency of the Bootstrap Estimate of Variance . . . . . . . . . . . . . . . . . . . . . . . 28910.14 Trimmed Estimator of Bootstrap Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29110.15 Unreliability of Untrimmed Bootstrap Standard Errors . . . . . . . . . . . . . . . . . . . . 29210.16 Consistency of the Percentile Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29310.17 Bias-Corrected Percentile Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

CONTENTS vi

10.18 BCa Percentile Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29710.19 Percentile-t Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29810.20 Percentile-t Asymptotic Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30110.21 Bootstrap Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30210.22 Wald-Type Bootstrap Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30510.23 Criterion-Based Bootstrap Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30510.24 Parametric Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30710.25 How Many Bootstrap Replications? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30810.26 Setting the Bootstrap Seed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30810.27 Bootstrap Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30910.28 Bootstrap Regression Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31010.29 Wild Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31210.30 Bootstrap for Clustered Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31310.31 Technical Proofs* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

III Multiple Equation Models 325

11 Multivariate Regression 32611.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32611.2 Regression Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32611.3 Least-Squares Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32711.4 Mean and Variance of Systems Least-Squares . . . . . . . . . . . . . . . . . . . . . . . . . . 32911.5 Asymptotic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33011.6 Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33211.7 Seemingly Unrelated Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33311.8 Equivalence of SUR and Least-Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33511.9 Maximum Likelihood Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33511.10 Restricted Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33611.11 Reduced Rank Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33711.12 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34011.13 PCA with Additional Regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34211.14 Factor-Augmented Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

12 Instrumental Variables 34712.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34712.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34712.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34812.4 Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35012.5 Example: College Proximity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35212.6 Reduced Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35312.7 Reduced Form Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35412.8 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35512.9 Instrumental Variables Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35612.10 Demeaned Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35912.11 Wald Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359

CONTENTS vii

12.12 Two-Stage Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36112.13 Limited Information Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . 36412.14 JIVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36612.15 Consistency of 2SLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36712.16 Asymptotic Distribution of 2SLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36912.17 Determinants of 2SLS Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37012.18 Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37112.19 LIML Asymptotic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37312.20 Functions of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37412.21 Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37512.22 Finite Sample Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37612.23 Bootstrap for 2SLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37712.24 The Peril of Bootstrap 2SLS Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 38012.25 Clustered Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38012.26 Generated Regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38112.27 Regression with Expectation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38512.28 Control Function Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38712.29 Endogeneity Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39012.30 Subset Endogeneity Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39312.31 OverIdentification Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39512.32 Subset OverIdentification Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39712.33 Bootstrap Overidentification Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40012.34 Local Average Treatment Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40112.35 Identification Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40412.36 Weak Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40612.37 Many Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40812.38 Testing for Weak Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41212.39 Weak Instruments with k2 > 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41912.40 Example: Acemoglu, Johnson and Robinson (2001) . . . . . . . . . . . . . . . . . . . . . . 42112.41 Example: Angrist and Krueger (1991) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42312.42 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428

13 Generalized Method of Moments 43613.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43613.2 Moment Equation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43613.3 Method of Moments Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43713.4 Overidentified Moment Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43813.5 Linear Moment Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43913.6 GMM Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43913.7 Distribution of GMM Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44013.8 Efficient GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44113.9 Efficient GMM versus 2SLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44213.10 Estimation of the Efficient Weight Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44313.11 Iterated GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44413.12 Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44413.13 Clustered Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44413.14 Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445

CONTENTS viii

13.15 Restricted GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44613.16 Nonlinear Restricted GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44813.17 Constrained Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44913.18 Multivariate Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44913.19 Distance Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45113.20 Continuously-Updated GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45213.21 OverIdentification Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45313.22 Subset OverIdentification Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45413.23 Endogeneity Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45513.24 Subset Endogeneity Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45513.25 Nonlinear GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45613.26 Bootstrap for GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45813.27 Conditional Moment Equation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45913.28 Technical Proofs* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

IV Dependent and Panel Data 470

14 Time Series 47114.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47114.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47114.3 Differences and Growth Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47214.4 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47414.5 Transformations of Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47714.6 Convergent Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47714.7 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47714.8 Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47914.9 Conditioning on Information Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48014.10 Martingale Difference Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48114.11 CLT for Martingale Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48414.12 Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48414.13 CLT for Correlated Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48614.14 Linear Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48714.15 White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48914.16 The Wold Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48914.17 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49114.18 Moving Average Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49114.19 Infinite-Order Moving Average Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49314.20 Lag Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49314.21 First-Order Autoregressive Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49414.22 Unit Root and Explosive AR(1) Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49714.23 Second-Order Autoregressive Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49814.24 AR(p) Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50014.25 Impulse Response Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50214.26 ARMA and ARIMA Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50414.27 Mixing Properties of Linear Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50414.28 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505

CONTENTS ix

14.29 Estimation of Autoregressive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50814.30 Asymptotic Distribution of Least Squares Estimator . . . . . . . . . . . . . . . . . . . . . . 50914.31 Distribution Under Homoskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50914.32 Asymptotic Distribution Under General Dependence . . . . . . . . . . . . . . . . . . . . . 51014.33 Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51114.34 Covariance Matrix Estimation Under General Dependence . . . . . . . . . . . . . . . . . . 51214.35 Testing the Hypothesis of No Serial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . 51314.36 Testing for Omitted Serial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51414.37 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51514.38 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51614.39 Time Series Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51814.40 Static, Distributed Lag, and Autoregressive Distributed Lag Models . . . . . . . . . . . . . 51914.41 Time Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52014.42 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52214.43 Granger Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52314.44 Testing for Serial Correlation in Regression Models . . . . . . . . . . . . . . . . . . . . . . 52514.45 Bootstrap for Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52614.46 Technical Proofs* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52814.47 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537

15 Multivariate Time Series 54115.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54115.2 Multiple Equation Time Series Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54115.3 Linear Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54215.4 Multivariate Wold Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54315.5 Impulse Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54515.6 VAR(1) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54615.7 VAR(p) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54715.8 Regression Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54715.9 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54815.10 Asymptotic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54915.11 Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55015.12 Selection of Lag Length in an VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55115.13 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55115.14 Predictive Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55315.15 Impulse Response Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55415.16 Local Projection Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55515.17 Regression on Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55515.18 Orthogonalized Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55615.19 Orthogonalized Impulse Response Function . . . . . . . . . . . . . . . . . . . . . . . . . . 55815.20 Orthogonalized Impulse Response Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 55815.21 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55915.22 Forecast Error Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56015.23 Identification of Recursive VARs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56115.24 Oil Price Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56215.25 Structural VARs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56315.26 Identification of Structural VARs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567

CONTENTS x

15.27 Long-Run Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56815.28 Blanchard and Quah (1989) Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57015.29 External Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57215.30 Dynamic Factor Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57315.31 Technical Proofs* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57515.32 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577

16 Non Stationary Time Series 58116.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58116.2 Trend Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58116.3 Autoregressive Unit Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58116.4 Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58316.5 Cointegrated VARs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584

17 Panel Data 58617.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58617.2 Time Indexing and Unbalanced Panels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58717.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58817.4 Pooled Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58817.5 One-Way Error Component Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59017.6 Random Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59017.7 Fixed Effect Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59317.8 Within Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59417.9 Fixed Effects Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59617.10 Differenced Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59717.11 Dummy Variables Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59917.12 Fixed Effects Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 60117.13 Fixed Effects Estimation in Stata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60217.14 Between Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60317.15 Feasible GLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60417.16 Intercept in Fixed Effects Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60517.17 Estimation of Fixed Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60617.18 GMM Interpretation of Fixed Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60617.19 Identification in the Fixed Effects Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60817.20 Asymptotic Distribution of Fixed Effects Estimator . . . . . . . . . . . . . . . . . . . . . . 60817.21 Asymptotic Distribution for Unbalanced Panels . . . . . . . . . . . . . . . . . . . . . . . . 61017.22 Heteroskedasticity-Robust Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . 61217.23 Heteroskedasticity-Robust Estimation – Unbalanced Case . . . . . . . . . . . . . . . . . . 61317.24 Hausman Test for Random vs Fixed Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 61417.25 Random Effects or Fixed Effects? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61517.26 Time Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61517.27 Two-Way Error Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61617.28 Instrumental Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61717.29 Identification with Instrumental Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61917.30 Asymptotic Distribution of Fixed Effects 2SLS Estimator . . . . . . . . . . . . . . . . . . . 61917.31 Linear GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62117.32 Estimation with Time-Invariant Regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . 622

CONTENTS xi

17.33 Hausman-Taylor Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62317.34 Jackknife Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62617.35 Panel Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62617.36 Dynamic Panel Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62717.37 The Bias of Fixed Effects Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62817.38 Anderson-Hsiao Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62917.39 Arellano-Bond Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63017.40 Weak Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63217.41 Dynamic Panels with Predetermined Regressors . . . . . . . . . . . . . . . . . . . . . . . . 63317.42 Blundell-Bond Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63517.43 Forward Orthogonal Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63717.44 Empirical Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640

18 Difference in Differences 64318.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64318.2 Minimum Wage in New Jersey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64318.3 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64618.4 Multiple Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64818.5 Do Police Reduce Crime? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64918.6 Trend Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65118.7 Do Blue Laws Affect Liquor Sales? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65218.8 Check Your Code: Does Abortion Impact Crime? . . . . . . . . . . . . . . . . . . . . . . . . 65318.9 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656

V Nonparametric and Nonlinear Methods 658

19 Nonparametric Regression 65919.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65919.2 Binned Means Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65919.3 Kernel Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66119.4 Local Linear Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66219.5 Local Polynomial Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66319.6 Asymptotic Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66419.7 Asymptotic Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66719.8 AIMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66719.9 Boundary Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66919.10 Reference Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67019.11 Nonparametric Residuals and Prediction Errors . . . . . . . . . . . . . . . . . . . . . . . . 67119.12 Cross-Validation Bandwidth Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67219.13 Asymptotic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67419.14 Undersmoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67619.15 Conditional Variance Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67719.16 Variance Estimation and Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67819.17 Confidence Bands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67819.18 The Local Nature of Kernel Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679

CONTENTS xii

19.19 Application to Wage Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68019.20 Clustered Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68219.21 Application to Testscores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68419.22 Multiple Regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68519.23 Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68819.24 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68819.25 Technical Proofs* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695

20 Series Regression 69720.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69720.2 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69820.3 Illustrating Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69920.4 Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70020.5 Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70120.6 Illustrating Spline Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70220.7 The Global/Local Nature of Series Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 70420.8 Stone-Weierstrass and Jackson Approximation Theory . . . . . . . . . . . . . . . . . . . . 70620.9 Regressor Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70820.10 Matrix Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70820.11 Consistent Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71020.12 Convergence Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71120.13 Asymptotic Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71320.14 Regression Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71420.15 Undersmoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71420.16 Residuals and Regression Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71520.17 Cross-Validation Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71620.18 Variance and Standard Error Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71720.19 Clustered Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71820.20 Confidence Bands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71920.21 Uniform Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71920.22 Partially Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72020.23 Panel Fixed Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72020.24 Multiple Regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72120.25 Additively Separable Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72120.26 Nonparametric Instrumental Variables Regression . . . . . . . . . . . . . . . . . . . . . . . 72220.27 NPIV Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72320.28 NPIV Convergence Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72520.29 Nonparametric vs Parametric Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 72520.30 Example: Angrist and Lavy (1999) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72620.31 Technical Proofs* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736

21 Regression Discontinuity 739

22 Nonlinear Econometric Models 74022.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74022.2 Nonlinear Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740

CONTENTS xiii

22.3 Least Absolute Deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74322.4 Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74622.5 Limited Dependent Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74722.6 Binary Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74822.7 Count Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74922.8 Censored Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75022.9 Sample Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753

23 Machine Learning 75523.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75523.2 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75523.3 Bayesian Information Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75823.4 Akaike Information Criterion for Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 76023.5 Akaike Information Criterion for Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . 76323.6 Mallows Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76423.7 Cross-Validation Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76523.8 K-Fold Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76623.9 Many Selection Criteria are Similar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76723.10 Relation with Likelihood Ratio Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76823.11 Consistent Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76923.12 Asymptotic Selection Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77223.13 Focused Information Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77423.14 Best Subset and Stepwise Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77623.15 The MSE of Model Selection Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77723.16 Inference After Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77923.17 Empirical Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78123.18 Shrinkage Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78223.19 James-Stein Shrinkage Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78323.20 Interpretation of the Stein Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78523.21 Positive Part Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78623.22 Shrinkage Towards Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78623.23 Group James-Stein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78923.24 Empirical Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79023.25 Model Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79223.26 Smoothed BIC and AIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79523.27 Mallows Model Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79723.28 Jackknife (CV) Model Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80023.29 Empirical Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80123.30 Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80123.31 LASSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80623.32 Computation of the LASSO Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80823.33 Elastic Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80923.34 Regression Sample Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80923.35 Regression Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81123.36 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81323.37 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81323.38 Ensembling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814

CONTENTS xiv

23.39 Technical Proofs* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824

Appendices 827

A Matrix Algebra 827A.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827A.2 Complex Matrices* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828A.3 Matrix Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829A.4 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829A.5 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830A.6 Rank and Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831A.7 Orthogonal and Orthonormal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832A.8 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832A.9 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833A.10 Positive Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834A.11 Idempotent Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835A.12 Singular Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835A.13 Matrix Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836A.14 Generalized Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836A.15 Extrema of Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838A.16 Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839A.17 QR Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840A.18 Solving Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841A.19 Algorithmic Matrix Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843A.20 Matrix Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844A.21 Kronecker Products and the Vec Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846A.22 Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847A.23 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847

B Useful Inequalities 850B.1 Inequalities for Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 850B.2 Inequalities for Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851B.3 Inequalities for Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852B.4 Probabability Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852B.5 Proofs* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857

References 872

Preface

This textbook is the second in a three-part series covering the core material typically taught in a one-year Ph.D. course in econometrics. The sequence is

1. Statistical Theory for Economists (first volume, abbreviated as STFE)

2. Econometrics (this volume)

The textbooks are written as an integrated series. Each volume is reasonably self-contained, but eachbuilds on the material introduced in the previous volume(s).

This volume assumes that students have a background in multivariate calculus, probability theory,linear algebra, and mathematical statistics. A prior course in undergraduate econometrics would behelpful but not required. Two excellent undergraduate textbooks are Wooldridge (2015) and Stock andWatson (2014). The relevant background in probability theory and mathematical statistics is provided inStatistical Theory for Economists.

For reference, the basic tools of matrix algebra and probability inequalites are reviewed in the Ap-pendix.

For students wishing to deepen their knowledge of matrix algebra in relation to their study of econo-metrics, I recommend Matrix Algebra by Abadir and Magnus (2005).

For further study in econometrics beyond this text, I recommend White (1984) and Davidson (1994)for asymptotic theory, Hamilton (1994) and Kilian and Lütkepohl (2017) for time series methods, Cameronand Trivedi (2005) and Wooldridge (2010) for panel data and discrete response models, and Li and Racine(2007) for nonparametrics and semiparametric econometrics. Beyond these texts, the Handbook ofEconometrics series provides advanced summaries of contemporary econometric methods and theory.

Alternative PhD-level econometrics textbooks include Theil (1971), Amemiya (1985), Judge, Griffiths,Hill, Lütkepohl, and Lee (1985), Goldberger (1991), Davidson and MacKinnon (1993), Johnston and Di-Nardo (1997), Davidson (2000), Hayashi (2000), Ruud (2000), Davidson and MacKinnon (2004), Greene(2017) and Magnus (2017). For a focus on applied methods see Angrist and Pischke (2009).

The end-of-chapter exercises are important parts of the text and are meant to help teach students ofeconometrics. Answers are not provided, and this is intentional.

I would like to thank Ying-Ying Lee and Wooyoung Kim for providing research assistance in preparingsome of the numerical analysis, graphics, and empirical examples presented in the text.

This is a manuscript in progress. Parts I-III are near complete. Parts IV and V are incomplete, inparticular Chapters 16, 21, 22 and 23.

xv

About the Author

Bruce E. Hansen is the Mary Claire Aschenbrenner Phipps Distinguished Chair of Economics at theUniversity of Wisconsin-Madison. Bruce is originally from Los Angeles, California, has an undergrad-uate degree in economics from Occidental College, and a Ph.D. in economics from Yale University. Hepreviously taught at the University of Rochester and Boston College.

Bruce is a Fellow of the Econometric Society, the Journal of Econometrics, and the InternationalAssociation of Applied Econometrics. He has served as Co-Editor of Econometric Theory and as AssociateEditor of Econometrica. He has published 62 papers in refereed journals which have received over 30,000citations.

xvi

Chapter 1

Introduction

1.1 What is Econometrics?

The term “econometrics” is believed to have been crafted by Ragnar Frisch (1895-1973) of Norway,one of the three principal founders of the Econometric Society, first editor of the journal Econometrica,and co-winner of the first Nobel Memorial Prize in Economic Sciences in 1969. It is therefore fittingthat we turn to Frisch’s own words in the introduction to the first issue of Econometrica to describe thediscipline.

A word of explanation regarding the term econometrics may be in order. Its definitionis implied in the statement of the scope of the [Econometric] Society, in Section I of theConstitution, which reads: “The Econometric Society is an international society for the ad-vancement of economic theory in its relation to statistics and mathematics.... Its main objectshall be to promote studies that aim at a unification of the theoretical-quantitative and theempirical-quantitative approach to economic problems....”

But there are several aspects of the quantitative approach to economics, and no singleone of these aspects, taken by itself, should be confounded with econometrics. Thus, econo-metrics is by no means the same as economic statistics. Nor is it identical with what we callgeneral economic theory, although a considerable portion of this theory has a defininitelyquantitative character. Nor should econometrics be taken as synonomous with the appli-cation of mathematics to economics. Experience has shown that each of these three view-points, that of statistics, economic theory, and mathematics, is a necessary, but not by itselfa sufficient, condition for a real understanding of the quantitative relations in modern eco-nomic life. It is the unification of all three that is powerful. And it is this unification thatconstitutes econometrics.

Ragnar Frisch, Econometrica, (1933), 1, pp. 1-2.

This definition remains valid today, although some terms have evolved somewhat in their usage.Today, we would say that econometrics is the unified study of economic models, mathematical statistics,and economic data.

Within the field of econometrics there are sub-divisions and specializations. Econometric theoryconcerns the development of tools and methods, and the study of the properties of econometric meth-ods. Applied econometrics is a term describing the development of quantitative economic models andthe application of econometric methods to these models using economic data.

1

CHAPTER 1. INTRODUCTION 2

1.2 The Probability Approach to Econometrics

The unifying methodology of modern econometrics was articulated by Trygve Haavelmo (1911-1999)of Norway, winner of the 1989 Nobel Memorial Prize in Economic Sciences, in his seminal paper “Theprobability approach in econometrics” (1944). Haavelmo argued that quantitative economic modelsmust necessarily be probability models (by which today we would mean stochastic). Deterministic mod-els are blatently inconsistent with observed economic quantities, and it is incoherent to apply determin-istic models to non-deterministic data. Economic models should be explicitly designed to incorporaterandomness; stochastic errors should not be simply added to deterministic models to make them ran-dom. Once we acknowledge that an economic model is a probability model, it follows naturally that anappropriate tool way to quantify, estimate, and conduct inferences about the economy is through thepowerful theory of mathematical statistics. The appropriate method for a quantitative economic analy-sis follows from the probabilistic construction of the economic model.

Haavelmo’s probability approach was quickly embraced by the economics profession. Today noquantitative work in economics shuns its fundamental vision.

While all economists embrace the probability approach, there has been some evolution in its imple-mentation.

The structural approach is the closest to Haavelmo’s original idea. A probabilistic economic modelis specified, and the quantitative analysis performed under the assumption that the economic modelis correctly specified. Researchers often describe this as “taking their model seriously”Ṫhe structuralapproach typically leads to likelihood-based analysis, including maximum likelihood and Bayesian esti-mation.

A criticism of the structural approach is that it is misleading to treat an economic model as correctlyspecified. Rather, it is more accurate to view a model as a useful abstraction or approximation. In thiscase, how should we interpret structural econometric analysis? The quasi-structural approach to infer-ence views a structural economic model as an approximation rather than the truth. This theory has ledto the concepts of the pseudo-true value (the parameter value defined by the estimation problem), thequasi-likelihood function, quasi-MLE, and quasi-likelihood inference.

Closely related is the semiparametric approach. A probabilistic economic model is partially spec-ified but some features are left unspecified. This approach typically leads to estimation methods suchas least-squares and the Generalized Method of Moments. The semiparametric approach dominatescontemporary econometrics, and is the main focus of this textbook.

Another branch of quantitative structural economics is the calibration approach. Similar to thequasi-structural approach, the calibration approach interprets structural models as approximations andhence inherently false. The difference is that the calibrationist literature rejects mathematical statistics(deeming classical theory as inappropriate for approximate models) and instead selects parameters bymatching model and data moments using non-statistical ad hoc1 methods.

Trygve Haavelmo

The founding ideas of the field of econometrics are largely due to the Nor-weigen econometrician Trygve Haavelmo (1911-1999). His advocacy of proba-bility models revolutionized the field, and his use of formal mathematical rea-soning laid the foundation for subsequent generations. He was awarded the No-bel Memorial Prize in Economic Sciences in 1989.

1Ad hoc means “for this purpose” – a method designed for a specific problem – and not based on a generalizable principle.


1.3 Econometric Terms and Notation

In a typical application, an econometrician has a set of repeated measurements on a set of variables.For example, in a labor application the variables could include weekly earnings, educational attainment,age, and other descriptive characteristics. We call this information the data, dataset, or sample.

We use the term observations to refer to the distinct repeated measurements on the variables. Anindividual observation often corresponds to a specific economic unit, such as a person, household, cor-poration, firm, organization, country, state, city or other geographical region. An individual observationcould also be a measurement at a point in time, such as quarterly GDP or a daily interest rate.

Economists typically denote variables by the italicized roman characters y , x, and/or z. The conven-tion in econometrics is to use the character y to denote the variable to be explained, while the charactersx and z are used to denote the conditioning (explaining) variables.

Following mathematical convention, real numbers (elements of the real line R, also called scalars)are written using lower case italics such as x, and vectors (elements of Rk ) by lower case bold italics suchas x , e.g.

x =

x1x2...

xk

.Upper case bold italics such as X are used for matrices.

We denote the number of observations by the natural number n, and subscript the variables by theindex i to denote the individual observation, e.g. yi , x i and z i . In some contexts we use indices otherthan i , such as in time series applications where the index t is common. In panel studies we typically usethe double index i t to refer to individual i at a time period t .

The i th observation is the set (yi , x i , z i ).The sample is the set {(yi , x i , z i ) : i = 1, ...,n}.

It is proper mathematical practice to use upper case X for random variables and lower case x forrealizations or specific values. Since we use upper case to denote matrices, the distinction betweenrandom variables and their realizations is not rigorously followed in econometric notation. Thus thenotation yi will in some places refer to a random variable, and in other places a specific realization.This is undesirable but there is little to be done about it without terrifically complicating the notation.Hopefully there will be no confusion as the use should be evident from the context.

We typically use Greek letters such as β, θ and σ2 to denote unknown parameters of an econometricmodel, and use boldface, e.g. β or θ, when these are vector-valued. Estimators are typically denoted byputting a hat “^”, tilde “~” or bar “-” over the corresponding letter, e.g. β̂ and β̃ are estimators of β.

The covariance matrix of an econometric estimator will typically be written using the capital bold-face V , often with a subscript to denote the estimator, e.g. V β̂ = var

[β̂

]as the covariance matrix for β̂.

Hopefully without causing confusion, we will use the notation V β = avar[β̂

]to denote the asymptotic

covariance matrix ofp

n(β̂−β) (the variance of the asymptotic distribution). Estimators will be denoted

by appending hats or tildes, e.g. V̂ β is an estimator of V β.


1.4 Observational Data

A common econometric question is to quantify the causal impact of one set of variables on anothervariable. For example, a concern in labor economics is the returns to schooling – the change in earningsinduced by increasing a worker’s education, holding other variables constant. Another issue of interestis the earnings gap between men and women.

Ideally, we would use experimental data to answer these questions. To measure the returns toschooling, an experiment might randomly divide children into groups, mandate different levels of ed-ucation to the different groups, and then follow the children’s wage path after they mature and enter thelabor force. The differences between the groups would be direct measurements of the effects of differ-ent levels of education. However, experiments such as this would be widely condemned as immoral!Consequently, in economics non-laboratory experimental data sets are typically narrow in scope.

Instead, most economic data is observational. To continue the above example, through data col-lection we can record the level of a person’s education and their wage. With such data we can measurethe joint distribution of these variables, and assess the joint dependence. But from observational data itis difficult to infer causality as we are not able to manipulate one variable to see the direct effect on theother. For example, a person’s level of education is (at least partially) determined by that person’s choices.These factors are likely to be affected by their personal abilities and attitudes towards work. The fact thata person is highly educated suggests a high level of ability, which suggests a high relative wage. This is analternative explanation for an observed positive correlation between educational levels and wages. Highability individuals do better in school, and therefore choose to attain higher levels of education, and theirhigh ability is the fundamental reason for their high wages. The point is that multiple explanations areconsistent with a positive correlation between schooling levels and education. Knowledge of the jointdistribution alone may not be able to distinguish between these explanations.

Most economic data sets are observational, not experimental. This means thatall variables must be treated as random and possibly jointly determined.

This discussion means that it is difficult to infer causality from observational data alone. Causalinference requires identification, and this is based on strong assumptions. We will discuss these issueson occasion throughout the text.

1.5 Standard Data Structures

There are five major types of economic data sets: cross-sectional, time series, panel, clustered, andspatial. They are distinguished by the dependence structure across observations.

Cross-sectional data sets have one observation per individual. Surveys and administrative recordsare a typical source for cross-sectional data. In typical applications, the individuals surveyed are per-sons, households, firms or other economic agents. In many contemporary econometric cross-sectionstudies the sample size n is quite large. It is conventional to assume that cross-sectional observationsare mutually independent. Most of this text is devoted to the study of cross-section data.

Time series data are indexed by time. Typical examples include macroeconomic aggregates, pricesand interest rates. This type of data is characterized by serial dependence. Most aggregate economic datais only available at a low frequency (annual, quarterly or perhaps monthly) so the sample size is typicallymuch smaller than in cross-section studies. An exception is financial data where data are available at ahigh frequency (weekly, daily, hourly, or by transaction) so sample sizes can be quite large.


Panel data combines elements of cross-section and time series. These data sets consist of a set ofindividuals (typically persons, households, or corporations) measured repeatedly over time. The com-mon modeling assumption is that the individuals are mutually independent of one another, but a givenindividual’s observations are mutually dependent. In some panel data contexts, the number of time se-ries observations T per individual is small while the number of individuals n is large. In other panel datacontexts (for example when countries or states are taken as the unit of measurement) the number ofindividuals n can be small while the number of time series observations T can be moderately large. Animportant issue in econometric panel data is the treatment of error components.

Clustered samples are increasing popular in applied economics and are related to panel data. In clus-tered sampling, the observations are grouped into “clusters” which are treated as mutually independentyet allowed to be dependent within the cluster. The major difference with panel data is that clusteredsampling typically does not explicitly model error component structures, nor the dependence withinclusters, but rather is concerned with inference which is robust to arbitrary forms of within-cluster cor-relation.

Spatial dependence is another model of interdependence. The observations are treated as mutuallydependent according to a spatial measure (for example, geographic proximity). Unlike clustering, spatialmodels allow all observations to be mutually dependent, and typically rely on explicit modeling of thedependence relationships. Spatial dependence can also be viewed as a generalization of time seriesdependence.

Data Structures

• Cross-section

• Time-series

• Panel

• Clustered

• Spatial

As we mentioned above, most of this text will be devoted to cross-sectional data under the assump-tion of mutually independent observations. By mutual independence we mean that the i th observation(yi , x i , z i

)is independent of the j th observation

(y j , x j , z j

)for i 6= j . In this case we say that the data

are independently distributed. (Sometimes the label “independent” is misconstrued. It is a statementabout the relationship between observations i and j , not a statement about the relationship between yiand x i and/or z i .)

Furthermore, if the data is randomly gathered, it is reasonable to model each observation as a drawfrom the same probability distribution. In this case we say that the data are identically distributed.If the observations are mutually independent and identically distributed, we say that the observationsare independent and identically distributed, i.i.d., or a random sample. For most of this text we willassume that our observations come from a random sample.

Definition 1.1 The observations (yi , x i , z i ) are a sample from the distributionF if they are identically distributed across i = 1, ...,n with joint distribution F .


Definition 1.2 The observations (yi , x i , z i ) are a random sample if they aremutually independent and identically distributed (i.i.d.) across i = 1, ...,n.

In the random sampling framework, we think of an individual observation(yi , x i , z i

)as a realization

from a joint probability distribution F(y, x , z

)which we can call the population. This “population” is

infinitely large. This abstraction can be a source of confusion as it does not correspond to a physicalpopulation in the real world. It is an abstraction since the distribution F is unknown, and the goal ofstatistical inference is to learn about features of F from the sample. The assumption of random samplingprovides the mathematical foundation for treating economic statistics with the tools of mathematicalstatistics.

The random sampling framework was a major intellectual breakthrough of the late 19th century,allowing the application of mathematical statistics to the social sciences. Before this conceptual devel-opment, methods from mathematical statistics had not been applied to economic data as the latter wasviewed as non-random. The random sampling framework enabled economic samples to be treated asrandom, a necessary precondition for the application of statistical methods.

1.6 Econometric Software

Economists use a variety of econometric, statistical, and programming software.Stata (www.stata.com) is a powerful statistical program with a broad set of pre-programmed econo-

metric and statistical tools. It is quite popular among economists, and is continuously being updatedwith new methods. It is an excellent package for most econometric analysis, but is limited when youwant to use new or less-common econometric methods which have not yet been programed. At manypoints in this textbook specific Stata estimation methods and commands are described. These com-mands are valid for Stata version 15.

MATLAB (www.mathworks.com), GAUSS (www.aptech.com), and OxMetrics (www.oxmetrics.net)are high-level matrix programming languages with a wide variety of built-in statistical functions. Manyeconometric methods have been programed in these languages and are available on the web. The ad-vantage of these packages is that you are in complete control of your analysis, and it is easier to programnew methods than in Stata. Some disadvantages are that you have to do much of the programming your-self, programming complicated procedures takes significant time, and programming errors are hard toprevent and difficult to detect and eliminate. Of these languages, GAUSS used to be quite popular amongeconometricians, but currently MATLAB is more popular.

An intermediate choice is R (www.r-project.org). R has the capabilities of the above high-level matrixprogramming languages, but also has many built-in statistical environments which can replicate muchof the functionality of Stata. R is the dominate programming language in the statistics field, so methodsdeveloped in that arena are most commonly available in R. Uniquely, R is open-source, user-contributed,and best of all, completely free! A smaller but growing group of econometricians are enthusiastic fans ofR.

For highly-intensive computational tasks, some economists write their programs in a standard pro-gramming language such as Fortran or C. This can lead to major gains in computational speed, at thecost of increased time in programming and debugging.

There are many other packages which are used by econometricians, include Eviews, Gretl, PcGive,Python, Julia, RATS, and SAS.

As the packages described above have distinct advantages, many empirical economists end up usingmore than one package. As a student of econometrics, you will learn at least one of these packages, and


probably more than one. My advice is that all students of econometrics should develop a basic level offamiliarity with Stata, and either Matlab or R (or all three).

1.7 Replication

Scientific research needs to be documented and replicable. For social science research using obser-vational data, this requires careful documentation and archiving of the research methods, data manipu-lations, and coding.

The best practice is as follows. Accompanying each published paper an author should create a com-plete replication package (set of data files, documentation, and program code files). This package shouldcontain the source (raw) data used for analysis, and code which executes the empirical analysis and othernumerical work reported in the paper. In most cases this is a set of programs which may need to be ex-ecuted sequentially. (For example, there may be an initial program which �

econometrics - ssc.wisc.edussc.wisc.edu/~bhansen/econometrics/econometrics.pdf · econometrics...

Documents