4885_2fmurr_ex10w

Chapter 20

Generalized Method of MomentsEstimators and Identification“A foolish consistency is the hobgoblin of little minds, adored by littlestatesmen and philosophers and divines.”1

—RALPH WALDO EMERSON (1803–1882)

Despite Emerson’s contempt for foolish consistency, consis-tency is not a foolish property for an estimator. Indeed, itsabsence would seem troubling. Who would want to admitthat his or her estimator wouldn’t get the right answerwith “all the data in the world”? Consequently, a widely

applicable method for devising consistent estimators is a useful tool. Thissection develops two such tools, the method of moments and thegeneralized method of moments; these are the most recently popularized es-timation techniques presented in this book. In addition to its usefulness inapplications, method of moments estimation provides a natural bridge to afundamental concept in econometrics, identification, which we’ll study inthe next section.

20.1 Method of Moments EstimatorsThe ungeneralized method of moments harkens back toChapter 2, where we sought estimators for a straight linethrough the origin. How were we to construct estimatorsfrom a sample of data? The method of moments offers astraightforward strategy for devising estimators that proveconsistent under very general assumptions. Just as our intu-ition led to several estimators in Chapter 2, the method ofmoments may also lead to multiple estimators. In such cases,we turn to the generalized method of moments to settle on asingle consistent estimator.

EXT 10-1

Web Extension 10

EXT 10-2 Web Extension 10

MomentsStatisticians call the expected values of variables or of products of variables mom-ents. , and are all examples of moments. Wecommonly make assumptions about moments in our data-generating processes.For example, we might assume

and

as is true under the Gauss–Markov Assumptions for a straight line with unknownslope and intercept. (Notice that in this example, the explanators need not befixed across samples. The assumption that is weaker than the as-sumption that the Xi are fixed across samples.)

The Ungeneralized Method of MomentsMethod of moments estimation devises an estimator by insisting that a momentexpectation that is true in the population holds true exactly for the correspondingmoment within a given sample. That is, the method of moments insists that some-thing true on average of the disturbances in the population be true on averageabout the residuals in any one sample. In our example, the mean residual in thesample is forced to be zero, or the covariance of the X’s and the residuals in thesample is forced to be zero, analogously to the expected value of the disturbancesin the population or the covariance of X and the disturbances in the populationboth being zero. The method of moments estimators for and , and , inour example are, therefore, found by solving

20.1

and

20.2

the within-sample versions of the moment conditions assumed in the DGP. The are the residuals obtained using the estimators and . In this example, theserelationships happen to be the same relationships required to minimize the sum ofsquared residuals, as we learned in Chapter 5. Solving these equations for andb

~0

b~

1b~

0

e~i

1naXie~i =

1naXi(Yi - b

~0 - b

~1Xi) = 0,

1na e~i =

1na(Yi - b

~0 - b

~1Xi) = 0

b~

1b~

0b1b0

E(Xiei) = 0

E(Xiei) = 0,

E(ei) = 0,

Yi = b0 + b1Xi + ei,

E(Xiei2)E(Xi), E(ei), E(ei

2), E(Xiei)

Generalized Method of Moments Estimators and Identification EXT 10-3

, therefore, leads to and . The method of moments estimators in this ex-ample coincide with the ordinary least squares (OLS) estimators, and .

The method of moments estimators are consistent. An intuition for their con-sistency is that the Law of Large Numbers says that under suitable conditionsEquations 20.1 and 20.2 are almost surely very close to correct if and arereplaced by and . Thus, if the DGP satisfies the conditions for the Law ofLarge Numbers, the method of moments estimators tend to coincide with the trueparameter values as the sample size grows without bound. Appendix 20.1 moreformally shows the consistency of the method of moments estimator in thismodel. In general, method of moments estimators are consistent whenever theLaw of Large Numbers ensures that the sample moments in the DGP converge inprobability to the corresponding population moments.

The Generalized Method of MomentsOur method of moments estimation of a line with unknown intercept and slopeexploited two moment conditions embedded in the Gauss–Markov Assumptions.But what would we do to estimate the slope of a line through the origin? The twomoment restrictions are still and but because there is onlyone parameter to estimate, the slope, the method of moments provides a surfeitof riches. It tells us to devise an estimate of from

and

But with two equations and only one unknown, solving both of these equa-tions for yields two estimators, not one. The first equation yields

whereas the second equation yields

b~

=

1naXiYi

1naXi

2=aXiYi

aXi2 = bg4.

b~

=

1naYi

1naXi

=aYi

aXi= bg2,

b~

b~

,

1naXie~i =

1naXi(Yi - b

~Xi) = 0.

1na e~i =

1na(Yi - b

~Xi) = 0

b

b,E(Xiei) = 0,E(ei) = 0

b1b0

b~

1b~

0

bN 1bN 0

bN 1bN 0b~

1


Both and are method of moments estimators of the slope of a linethrough the origin under the Gauss–Markov Assumptions (or in any DGP inwhich and including some in which the explanators arenot fixed). Notice that in satisfying one of two moment restrictions, neither nor satisfies both moment restrictions. When is the estimator, the meanresidual is zero, so matches its corresponding population moment. But when

is the estimator, does not equal zero (except by occasional accident),so this second sample moment does not equal its population counterpart. When

is the estimator, equals zero, but does not (except by occasionalaccident). Except in the accidental case in which and are equal, neithermethod of moments estimator makes both sample moments equal to their popu-lation counterparts.

The generalized method of moments provides an estimation strategy whenthe number of restricted moments in the DGP exceeds the number of parametersto be estimated. Rather than satisfy one moment condition and violate another,the generalized method of moments (GMM) strategy chooses an estimator thatbalances each population moment condition against the others, seeking residualsthat trade off violations of one moment restriction against violations of the othermoment restrictions. A GMM estimator may satisfy no one moment condition,but it may come close to satisfying them all.

In the example of a line through the origin, the BLUE property of makesit clear which is the most preferred method of moments estimator. We shouldchoose our estimator so that

and ignore the restriction that

This strategy yields , which we know to be BLUE. With more complex DGPs,the optimal choice is not so obvious. When the choice among method ofmoments estimators is not clear, GMM offers a strategy for devising a single esti-mator.

GMM does not require that the sample moments equal the population mo-ments. For example, in the simple case of a line through the origin with two mo-ment restrictions, the GMM estimator does not insist that

1na e~i =

1na(Yi - b

~Xi) = 0

bg4

1na e

~i =

1na(Yi - b

~Xi) = 0.

1naXie~i =

1naXi(Yi - b

~Xi) = 0

bg4

bg4bg2

1ng e~i

1ngXie~ibg4

1ngXie~ibg2

1ng e~i

bg2bg4

bg2

E(Xiei) = 0,E(ei) = 0

bg4bg2


and

Such insistence would be futile—generally, no estimator can satisfy both restric-tions. Nor does GMM insist that one sample moment or the other equal zero. In-stead, GMM looks at how much an estimator makes the sample moments differfrom their population counterparts, as in

and

where and are the amounts by which the first and second moment restric-tions are violated by the residuals implied by . One estimation strategy thatwould yield consistent estimators would be to minimize . This strategywould aim to make as small as possible the squared deviations of the sample mo-ments from the their population analogs. This strategy shares an intuitive founda-tion with ordinary least squares, but here the goal is to make small not thesquared residuals, but the squared deviations from the population moment re-strictions. This reasonable approach is not the one followed by GMM.

GMM modifies the strategy of minimizing much as generalizedleast squares modifies ordinary least squares. Instead of minimizing the un-weighted sum of squared deviations, GMM minimizes a weighted sumof the squared deviations, in which the weights reflect the variances and covari-ances of the . GMM does not guarantee an efficient estimator, but it does pro-vide a consistent estimator, and its weighting scheme is more efficient than thesimpler unweighted scheme. GMM provides a powerful tool for finding consis-tent estimators in models that are otherwise mathematically quite cumbersome.

20.2 IdentificationWe have just seen that when the DGP has more moment conditions than parame-ters to be estimated, we have a surfeit of riches. The problem isn’t finding a con-sistent estimator, but choosing among several method of moments estimators. Butwhat if there are fewer moment restrictions than parameters to be estimated?When there are too few restrictions in the DGP to allow consistent estimation of

ni

(n12

+ n22),

(n12

+ n22)

(n12

+ n22)

b*

n2n1

1naXie*i =

1naXi(Yi - b*Xi) = n2,

1nae*i =

1na(Yi - b*Xi) = n1

1naXie~i =

1naXi(Yi - b

~Xi) = 0.

WHAT IS THE DGP?


some or all parameters, we say the parameters are underidentified. When thereare more restrictions than necessary to estimate the parameters consistently, wesay the parameters are overidentified. Underidentified parameters cannot be esti-mated consistently.

Underidentified ParametersThe following example helps us understand underidentification. Suppose that inthe DGP for a straight line with an unknown intercept term, the X-values are notfixed across samples, but rather X is a random variable. If X is a random variable,it might be correlated with the disturbance term. For example, in a wage equationin which education and experience are the only explanators, the disturbance con-tains all other influences on wages, such as punctuality, diligence, and native intel-ligence. If the education one attains is correlated with those same traits, then ourexplanator education is correlated with the disturbances. In general, if the ex-planators are correlated with the disturbances, we cannot say that . Insuch a case, we have only one moment condition, , but two parametersto estimate.

With only one moment condition and two parameters to estimate, we canstill choose estimates of and that make the population moment conditiontrue in our sample:

But with one equation and two unknowns, we can choose any value for andthen compute the value for that makes the moment condition true. Conversely,we could choose any value for and then compute the value for that makesthe moment condition true. Such arbitrary parameter estimates do not have theproperty of consistency.2 When the DGP offers too few restrictions to pin downthe parameters of interest, we say the relationship’s parameters are underidenti-fied.

Exactly Identified ParametersIn contrast to the DGP with unknown slope and intercept, in the DGP for astraight line through the origin, abandoning the population restriction that

does not pose a problem for consistent estimation. With only one pa-rameter to be estimated, the one moment restriction, provides a basisfor consistent estimation. Indeed, in this DGP, abandoning the assumption that

relieves the surfeit of riches that plagued us earlier. With one restric-tion and one parameter, there will be only one method of moments estimator forE(Xiei) = 0

E(ei) = 0,E(Xiei) = 0

b~

0b~

1

b~

1

b~

0

1na(ei) =

1na(Yi - b

~0 - b

~1Xi) = 0.

b1b0

E(ei) = 0E(Xiei) = 0


the slope of the line through the origin, . In a classic application, presented inChapter 2, Milton Friedman estimated the marginal propensity to consume frompermanent income using because he believed that was true in hismodel and that was not. When the restrictions in the DGP imply asingle GMM estimator for each parameter, we say the parameters of the relation-ship are exactly identified, or just identified.

Overidentified ParametersExact identification permits consistent estimation. Underidentification makes con-sistent estimation impossible. What about overidentification? When the restrictionsin the DGP yield a surfeit of riches, with more restrictions than parameters to be es-timated, we say the parameters of the relationship are overidentified. At first look,overidentification and underidentification seem similar in their consequences. Inneither case do the moment restrictions of the DGP yield unique estimators of therelationship’s parameters. However, more deeply, the two phenomena are dramati-cally different. The several (or many) estimators offered by overidentification willall converge in probability to the same result; all the estimators are consistent. Themultitude of estimates possible with underidentification, however, remain arbitraryand in conflict with one another, even in infinite samples.

In the face of overidentification, we are pressed to choose among the multi-tude of consistent estimators. GMM is one common strategy for settling on oneconsistent estimator from among many. GMM is not generally asymptotically ef-ficient, however. Later in this extension, we encounter an efficient estimator,called the maximum likelihood estimator, which, when it is applicable, is prefer-able to GMM when estimating overidentified relationships. The problem posedby overidentification is modest. We can proceed in the face of overidentificationconfident that all consistent estimators give similar results in very large samples.In contrast, in the face of underidentification, we cannot consistently estimate theparameters of interest at all.

Exclusion and Covariance RestrictionsThe examples of a straight line through the origin and a straight line with un-known intercept and slope illustrate two fundamental ways in which econometri-cians achieve identification of parameters. In the DGP

and

E(ei) = 0 i = 1, Á , n,

Yi = b0 + b1Xi + ei i = 1, Á , n

E(Xiei = 0)E(ei) = 0bg2

bg2

(Text continues on page EXT 10-10)


“How do managerial decisions such as whetheror not to adopt a Total Quality Management(TQM) system or to expand an employee in-volvement program affect labor productivity?Does the implementation of ‘high performance’workplace practices ensure better firm per-formance? Does the presence of a union hinderor enhance the probability of success associ-ated with implementing these practices? Docomputers really help workers be more produc-tive?”3 Economists Sandra Black of the FederalReserve Bank of New York and Lisa Lynch ofTufts University posed these pressing contem-porary questions in 2001. Their informativeeconometric study provides answers of interestto many corporate managers.

Black and Lynch study a sample of morethan 600 U.S. manufacturing establishments,each observed once per year by the Census Bu-reau between 1987 and 1994. In 1994 the dataon workplace practices were gathered from theestablishments, along with data on inputs andoutputs; in the earlier years, only data on in-puts and outputs were gathered.

The authors embedded their study in thecontext of a Cobb–Douglas production func-tion for manufacturing firms:

where Y is output, L is labor, K is capital, andM is materials. Notice that this function differsfrom most we have seen in that the interceptvaries from firm to firm. The authors aug-mented the Cobb–Douglas specification by as-suming that worker productivity (measured byY/L, output per worker) depends on establish-

+ b2ln(M>L)i + ei,

ln(Y>L)i = b0i + b1ln(K>L)i

ment-specific workplace practices and workercharacteristics:

where Zji is the i-th firm’s value for the j-th traitin a list of workplace practices and workercharacteristics; there are T items in the list inall.

Breaking the specification into two stepshighlights the underlying Cobb–Douglas as-sumption. The specification actually reduces toa single garden variety linear regression withinputs per worker (K/L and M/L), workplacepractices, and worker characteristics as ex-planators. What is not garden variety is the restof the DGP. The Gauss–Markov Assumptionssurely do not apply here. Each firm is observedseveral times; systematic differences amongfirms (some firms being fundamentally more orless productive than others) make it likely thatthe disturbances for each firm are correlatedover time, even if the disturbances are inde-pendent across firms. Moreover, were the au-thors to gather a different sample of firms, theywould surely obtain different values for the ex-planatory variables. The X’s are not fixed in re-peated samples. And with random X-valuescomes the worry that those firms wise enoughto choose beneficial X-values might alsochoose beneficial unmeasured practices—theinputs per worker and the workplace practicesmight be correlated with the disturbances. Lit-tle seems left of the Gauss–Markov Assump-tions on which to build good estimators.

The wholesale failure of the Gauss–MarkovAssumptions poses estimation problems for

b0i = a0 + aT

j = 1ajZji,

An Econometric Top 40—A Pop Tune

Competing in the New Economy


Black and Lynch. Is consistent estimation pos-sible at all? Are the parameters in their modelidentified? And if the parameters are identified,how are they estimated? Black and Lynch arguethat they know enough about the variances andcovariances of the explanators and distur-bances to identify the parameters in theirmodel through several covariance and exclu-sion restrictions. Consistent estimation is atleast possible. When the Gauss–Markov As-sumptions fail thoroughly enough to makeleast squares unattractive, but we do have suffi-cient information about the variances and co-variances of the explanators and disturbancesto restrict numerous moments in the popula-tion, the generalized method of moments(GMM) offers an appealing estimation proce-dure. Black and Lynch argue that they knowenough about the variances and covariances ofthe explanators and disturbances to rely onGMM to construct consistent, asymptoticallynormally distributed estimators of the a’s and

’s. Given the large number of establishmentsin Black and Lynch’s sample, good asymptoticproperties are an attractive basis for inference.The t-statistics and F-tests we are accustomedto analyzing are asymptotically valid here, sowe can discuss the empirical results of Blackand Lynch much as we would if they had usedOLS under the Gauss–Markov Assumptions.What do they find? The authors report thatTQM systems do not raise productivity bythemselves, but that allowing employees agreater voice in decisions does improve produc-tivity (and to the extent that TQM includessuch increased voice, TQM improves produc-tivity). Moreover, report the authors, institut-

b

ing profit sharing can have a positive effect onproductivity, but only when the plan includesprofit sharing for nonmanagerial employees.The authors further find that unionized estab-lishments that increase worker voice and addprofit sharing that includes nonmanagerial em-ployees get a particularly large boost in pro-ductivity from such workplace policies. In con-trast, productivity in unionized establishmentsthat do not introduce such new workplace poli-cies lags behind that in similarly un-innovativenonunionized establishments.

Black and Lynch also find that increasinguse of computers can enhance productivity.Firms with higher levels of computer usage bynonmanagerial workers have higher productiv-ity than those with less computer use.

Final Notes

Black and Lynch’s paper illustrates how every-day economic questions are sometimes besttackled with highly sophisticated empiricaltechniques. Relatively simple OLS would notreliably answer the questions Black and Lynchaddress. Instead, Black and Lynch combinedcovariance restrictions and exclusion restric-tions grounded in their understanding of thedata’s origins to provide both identification fortheir model and the moment restrictions neces-sary for conducting GMM estimation. Simpleeconomics can require complex statistics.When the DGPs suitable for modeling real-world data depart far from the Gauss–MarkovAssumptions, particularly complicated estima-tion strategies may be needed, rather than OLSor GLS.

■


neither the slope of the line, , nor the intercept, , is identified. We do not haveenough prior information about where the data come from to allow us to consis-tently estimate these parameters. What additional information would identify theslope? Our earlier discussion exposes two possibilities. First, we might learn thatthere is, in fact, no intercept term in the model; excluding the intercept from themodel would identify the slope. Second, we might learn that , in whichcase the slope (and the intercept) would be identified; restricting the covariancebetween the explanators and disturbances to be zero would identify the slope ofthe line. Exclusion restrictions and covariance restrictions are not the only ways amodel’s parameters become identified, but they are common strategies.

Instrumental variables (IV) estimation relies on exclusion and covariance re-strictions for its consistency. When the covariance restriction thatfails, IV can nonetheless consistently estimate the parameters of an equation, butonly if a combination of exclusion restrictions and covariance restrictions hold. IVestimation requires that we know that and , which are co-variance restrictions, and that the potential instrument, Z, is not itself a relevantexplanator of the dependent variable, which is an exclusion restriction.

It is important to note that we ought not impose identifying restrictions arbi-trarily. False restrictions misspecify the DGP and undermine both consistent esti-mation and valid statistical inference. Econometricians look to make plausibleidentifying assumptions and, when possible, to test whether the data support theidentifying assumptions.

20.3 An Application: Military Service and WagesUnderidentification makes econometric analysis futile. Consistent estimators ofan underidentified parameter do not exist. Essential to econometric success isidentifying the parameters of interest, and to identify parameters requires that ourDGP contain sufficient assumptions. Here we see how underidentification canthreaten a specific empirical project and how an econometrician can use commonsense and economic reasoning to impose assumptions on a DGP sufficient toidentify parameters of interest.

Let’s begin with the question “Does military service enhance an individual’sfuture earning power?” Many people have thought it does. Unfortunately, the hy-pothesis long proved difficult to test. It might seem simple to specify:

in which W is the person’s wage and M is a dummy variable indicating past mili-tary service, and to use OLS to estimate , the effect of past military service onwages. Unfortunately, OLS is not a consistent estimator of because usually does not equal zero; is underidentified. Why doesn’t equalE(Miei)b1

E(Miei)b1

b1

Wi = b0 + b1Mi + ei,

E(ZiXi) Z 0E(Ziei) = 0

E(Xiei) = 0

E(Xiei) = 0

b0b1


zero? Because individuals who join the military often differ from other people intraits that influence wages, but that the econometrician is unlikely to observe,such as self-discipline and self-confidence. Indeed, people often join the militarywith the specific intention of acquiring such traits. We can control for traits suchas education, work experience, and gender, but some very personal characteristicsthat influence decisions to join the military and also influence wage prospects willnot be measured and included in our data sets. These unmeasured traits are partof the disturbance term, and they may be different for people who serve and peo-ple who do not, so may not equal zero. Military service may give ill-disciplined enlistees more discipline, but they may still be undisciplined enoughthat they suffer lower wages than otherwise similar workers. If enlistees have un-measured traits that detract from their labor-market prospects, and if militaryservice does not fully overcome those disadvantages, the estimated coefficient onpast military service will be negative, despite what positive effect that service mayhave on those individuals’ earnings potential. Without , we have only

with which to estimate and . The coefficients are not identified.Economist Josh Angrist of MIT explored how we might identify the effect of

military service on wages.4 He decided that data from a draft lottery that tookplace during the Vietnam conflict would allow identification of the effect of mili-tary service on earnings. After carefully pondering the draft lottery, Angrist poseda DGP for the wages of workers who had been subject to that lottery. He positedtwo covariance restrictions and an exclusion restriction to identify . In the draftlottery, inductees were selected according to their birth dates. A random draw ofbirth dates determined which birth dates would be drafted. It was a lottery fewwanted to win. Because birth dates were unlikely to be correlated with individ-ual’s productivity characteristics, such as self-discipline or self-confidence, enter-ing the military through the draft was unlikely to be correlated with those traits.

Angrist defined a dummy variable L to indicate that a person’s birthday was alottery date. Angrist plausibly argued that —individuals’ labor-markettraits are unlikely to be correlated with their being born on a randomly selecteddate. Angrist also argued that E(Li Mi) does not equal zero—people who won thelottery were more likely than others to have military experience. and

were Angrist’s two covariance restrictions. Finally, Angrist arguedthat one’s lottery status should not itself directly influence wages, so he also ex-cluded L from the wage equation. The method of moments estimator for basedon and is:

where l, w, and m are L, W, and M measured as deviations from their ownmeans. Angrist’s second covariance restriction, that E(LiMi) does not equal zero,

b~

1 =a liwi

a limi,

E(Liei) = 0E(ei) = 0b

E(LiMi) Z 0E(Liei) = 0

E(Liei) = 0

b1

b1b0E(ei) = 0E(Miei) = 0

E(Miei)


ensures that the denominator in the estimator is highly unlikely to be zero inlarge samples.

Angrist implemented this estimator and concluded that military service doesnot improve future wage prospects. In the 1980s, years after their military service,white lottery veterans’ earnings were 15% less than comparable workers who hadnot served; black lottery veterans’ earnings were statistically indistinguishablefrom those of otherwise comparable nonveterans. Had Angrist not been able toargue for the plausibility of his covariance and exclusion restrictions, the effect ofmilitary service on earnings would not be identifiable from the earnings and draftlottery data. Because many economists were persuaded by Angrist’s assumptions,his estimates of the effect of military service gained widespread acceptance. Thoseeconomists who did not accept Angrist’s identifying assumptions remained unper-suaded by his parameter estimates.

An Organizational Structure for the Study of Econometrics

1. What is the DGP?

GMM requires restrictions on moments.

2. What Makes a Good Estimator?

Consistency

3. How Do We Create an Estimator?

Method of moments and generalized method of moments

4. What Are an Estimator’s Properties?

Identification is a minimal requirement for consistency.

GMM usually yields consistency.

5. How Do We Test Hypotheses?

Summary

The extension began by introducing a strategy, the method of moments, and itsgeneralization, the generalized method of moments (GMM), for constructingasymptotically normally distributed consistent estimators in a vast array ofDGPs. The assumptions about means, variances, and covariances that underpinGMM prove to be minimal requirements for consistent estimation of the parame-


ters of a DGP. We learn from this that a DGP may contain too little informationto support consistent estimation of some or all of its parameters. When a DGPsuffers such a shortage of information, we say some or all of its parameters areunderidentified.

The Law of Large Numbers and the Central Limit Theorem that imply theconsistency and asymptotic normality of GMM estimators (including OLS esti-mators) rest on bounded variances. When explanators or dependent variables inour DGPs have unbounded variances, the normality and even the consistency ofestimators becomes questionable.

Concepts for Review

Questions for Discussion1. “This identification business is nonsense. I can always run a regression, barring mul-

ticollinearity. That gives me estimates of the parameters of interest. I can always usethose estimates.” Agree or disagree, and discuss.

Problems for Analysis

1. For the DGP

show that the method of moments estimator of the slope is

2. For the DGP in problem 1, show that the method of moments estimator is consis-tent if

3. Show that a valid instrumental variable estimator for a DGP with a straight linethrough the origin is also a method of moments estimator.

plim Q1naZiXiR Z 0 and plim Q1naZieiR = 0.

bZ

bZ = aZiYi>aZiXi.

E(Ziei) = 0,

E(ZiXi) Z 0

Yi = bXi + ei

Exactly identified

Generalized method of moments (GMM)

Just identified

Method of moments

Moments

Overidentified

Underidentified


Endnotes1. Ralph Waldo Emerson, “Self-Reliance,” in Essays, 1841.2. Notice a special case here. If our DGP provides an increasing number of observations

for which we can consistently estimate by restricting attention to onlythose observations.

3. Sandra E. Black and Lisa M. Lynch, “How to Compete: The Impact of WorkplacePractices and Information Technology on Productivity,” Review of Economics andStatistics 83, no. 3 (August 2001): 434–445.

4. Joshua Angrist, “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidencefrom Social Security Administrative Records,” The American Economic Review 80,no. 3 (June 1990): 313–336.

Appendix 20.A

The Consistency of Method of MomentsEstimators

Method of moments estimators are very often consistent. In particular, they areconsistent whenever the Law of Large Numbers ensures that the sample momentsconverge in probability to the corresponding population moments. For example,in estimating a straight line with unknown slope and intercept, if plim ,

and plim , then the method of moments estimators and areconsistent. Because unbounded variances can undermine the Law of Large Num-bers, the consistency of and requires that the joint distribution of the Xi andthe be bounded appropriately.1

20.A.1 Proving ConsistencyThis section applies the rules for manipulating probability limits from Table 12.1to prove the consistency of the method of moments estimators of the slope and in-tercept of a straight line, assuming that plim and plim .Assuming instead that and that both the (homoskedastic) disturbancesand (fixed) explanators have finite, nonzero variances would suffice to establish,by the Law of Large Numbers, that these two probability limits are zero.

E(ei) = 0A1naXiei B = 0A1ngei B = 0

ei

b~

1b~

0

b~

1b~

0A1ngXiei B = 0A1ngei B = 0

b0X = 0,

WHAT ARE AN ESTI-MATOR’S PROPERTIES?


Because rule (iii) in Table 12.1 states that the plim of a sum is the sum of theplims, the convergence of the sample moments can be written

20.A.1

and

20.A.2

When the sample moments converge in probability to their population values,what the method of moments insists be true about the residuals in any sampleproves to be true about the disturbances (in probability limit) as the sample sizegrows large. This buys consistency for the method of moments estimators. Moreformally, because the method of moments always sets and equal tozero, their probability limits are also zero (according to the first rule in Table12.1, that the plim of a constant is the constant):

and

Applying rule (vi) from Table 12.1 (that the plim of a continuous function is thefunction of the plims) then yields

20.A.3

and

20.A.4- plim(b~

1) plimQ1naXi2R = 0.

= plimQ1naXiYiR - plim(b~

0) plimQ1naXiR

plim S Q1nRaXie~i T = plim S Q1nRaXi(Yi - b~

0 - b~

1Xi) T

= plimQ1naYiR - plim(b~

0) - plim(b~

1) plimQ1naXiR = 0.

plim S Q1nRa e~i T = plim S Q1nRa(Yi - b~

0 - b~

1Xi) T

plim S Q1nRaXie~i T = plim S Q1nRaXi(Yi - b~

0 - b~

1Xi) T = plim(0) = 0.

plim S Q1nRa e~i T = plim S Q1nRa(Yi - b~

0 - b~

1Xi) T = plim(0) = 0

1ngXie~i

1ng e~i

= plimQ1naXiYiR - b0 plimQ1naXiR - b1 plimQ1naX12R = 0.

plim S Q1nRaXiei T = plim S Q1nRaXi(Yi - b0 - b1Xi) T

= plimQ1naYiR - b0 - b1 plimQ1naXiR = 0

plim S Q1nRaei T = plim S Q1nRa(Yi - b0 - b1Xi) T


Comparing Equations 20.A.3 and 20.A.4 with Equations 20.A.1 and 20.A.2shows that plim and plim appear in Equations 20.A.3 and 20.A.4 justwhere and do in Equations 20.A.1 and 20.A.2. If Equations 20.A.3 and20.A.4 have a unique solution for plim and plim , then plim andplim .

There is one case in which the solution of Equations 20.A.3 and 20.A.4 forplim and plim is not unique. If X takes on only one value, and is thereforeperfectly collinear with the intercept term, Equation 20.A.4 is equivalent to Equa-tion 20.A.3: Equation 20.A.4 reduces to Equation 20.A.3 if we divide both sidesof Equation 20.A.4 by the constant value of X. In this case, we really have onlyone equation in two unknowns. And, in this special case of perfect multicollinear-ity, the two equations do not yield a unique solution for plim and plim .Otherwise, the solution is unique, so plim and plim .

Thus, barring perfect multicollinearity, when the sample moments convergein probability to their population expectations, and are consistent estima-tors. Barring perfect collinearity–like problems, method of moments estimatorsare generally consistent when the sample moments converge to their populationvalues. Because it is the Law of Large Numbers that ensures that sample meansconverge in probability to their population analogs, the Law of Large Numbers iskey to the consistency of method of moments estimators. As noted earlier, infinitevariances in a DGP endanger the applicability of the Law of Large Numbers.

Endnotes1. will converge in probability to zero if its expectation is zero and its variance

goes to zero as n grows. By assumption, so the first criterion is met. Thesecond criterion will also be met if we assume (equal to is a finite,nonzero constant, so that var A1ngXiei B = t4>n.t4,

E(X2i e

2i ))var(Xiei)

E(Xiei) = 0,

1ngXiei

b~

1b~

0

(b~

1) = b1(b~

0) = b0

(b~

1)(b~

0)

(b~

1)(b~

0)

(b~

1) = b1

= b0(b~

0)(b~

1)(b~

0)b1b0

(b~

1)(b~

0)

4885_2fmurr_ex10w

Documents