multilevel regression and poststratification in stata › meeting › chicago11 › materials ›...

84
Introduction Stata command Simulations Conclusion References Multilevel Regression and Poststratification in Stata Maurizio Pisati 1 Valeria Glorioso 1,2 [email protected] [email protected] 1 Dept. of Sociology and Social Research University of Milano-Bicocca (Italy) 2 Dept. of Society, Human Development, and Health Harvard School of Public Health Stata Conference Chicago 2011 July 14-15 Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 1/49

Upload: others

Post on 30-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Multilevel Regression and Poststratificationin Stata

Maurizio Pisati1 Valeria Glorioso1,2

[email protected] [email protected]

1Dept. of Sociology and Social ResearchUniversity of Milano-Bicocca (Italy)

2Dept. of Society, Human Development, and HealthHarvard School of Public Health

Stata Conference Chicago 2011July 14-15

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 1/49

Page 2: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Outline

1 IntroductionThe problemThe solution

2 Stata command

3 Simulations

4 Conclusion

5 References

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 2/49

Page 3: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Outline

1 IntroductionThe problemThe solution

2 Stata command

3 Simulations

4 Conclusion

5 References

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 2/49

Page 4: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Outline

1 IntroductionThe problemThe solution

2 Stata command

3 Simulations

4 Conclusion

5 References

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 2/49

Page 5: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Outline

1 IntroductionThe problemThe solution

2 Stata command

3 Simulations

4 Conclusion

5 References

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 2/49

Page 6: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Outline

1 IntroductionThe problemThe solution

2 Stata command

3 Simulations

4 Conclusion

5 References

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 2/49

Page 7: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

Introduction

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 3/49

Page 8: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

A common research objective

• Sometimes social scientists are interested in determiningwhether, and to what extent, the distribution of a giventarget variable Y varies across K groups defined by thevalues of one or more covariates of interest

• Let G denote a discrete variable representing the K groupsunder comparison. Without loss of generality, G canrepresent either a single discrete covariate or thecross-classification of two or more discrete covariates

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 4/49

Page 9: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

A common research objective

• Sometimes social scientists are interested in determiningwhether, and to what extent, the distribution of a giventarget variable Y varies across K groups defined by thevalues of one or more covariates of interest

• Let G denote a discrete variable representing the K groupsunder comparison. Without loss of generality, G canrepresent either a single discrete covariate or thecross-classification of two or more discrete covariates

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 4/49

Page 10: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

A common research objective

• In symbols:

G =

MG∏g=1

Vg

where Π is the Cartesian product operator; MG denotesthe total number of covariates forming G; and Vg denotesthe gth covariate

• We will refer to G as the group variable

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 5/49

Page 11: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

A common research objective

• In symbols:

G =

MG∏g=1

Vg

where Π is the Cartesian product operator; MG denotesthe total number of covariates forming G; and Vg denotesthe gth covariate

• We will refer to G as the group variable

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 5/49

Page 12: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

A common research objective

• The (conditional) distribution of Y within each category kof G can be described as follows:

Yk ∼ f(θk, φk) for k = 1, . . . ,K

where f(· ) denotes a generic probability distribution; θkdenotes the expected value(s) of the distribution; and φkdenotes one or more additional parameters of thedistribution (e.g., its variance)

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 6/49

Page 13: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

A common research objective

• For the sake of simplicity, let us focus on the expectedvalue(s) of Y , so that our goal is to determine whether, andto what extent, the expected value(s) of Y varies/varyacross the K categories of G

• In terms of regression analysis, this amounts to estimatingthe K possible values of the regression functionE(Y |G = k), i.e., E(Y |G = 1) ≡ θ1, E(Y |G = 2) ≡ θ2, . . . ,E(Y |G = K) ≡ θK

• Let us denote our estimand – i.e., our quantity of interest –by θ ≡ {θk : k = 1, . . . ,K}

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 7/49

Page 14: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

A common research objective

• For the sake of simplicity, let us focus on the expectedvalue(s) of Y , so that our goal is to determine whether, andto what extent, the expected value(s) of Y varies/varyacross the K categories of G

• In terms of regression analysis, this amounts to estimatingthe K possible values of the regression functionE(Y |G = k), i.e., E(Y |G = 1) ≡ θ1, E(Y |G = 2) ≡ θ2, . . . ,E(Y |G = K) ≡ θK

• Let us denote our estimand – i.e., our quantity of interest –by θ ≡ {θk : k = 1, . . . ,K}

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 7/49

Page 15: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

A common research objective

• For the sake of simplicity, let us focus on the expectedvalue(s) of Y , so that our goal is to determine whether, andto what extent, the expected value(s) of Y varies/varyacross the K categories of G

• In terms of regression analysis, this amounts to estimatingthe K possible values of the regression functionE(Y |G = k), i.e., E(Y |G = 1) ≡ θ1, E(Y |G = 2) ≡ θ2, . . . ,E(Y |G = K) ≡ θK

• Let us denote our estimand – i.e., our quantity of interest –by θ ≡ {θk : k = 1, . . . ,K}

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 7/49

Page 16: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

Estimating θ

• How do we get accurate – i.e., precise and unbiased –estimates of θ?

• For the sake of simplicity, let us suppose that (a)observations are sampled from a given target population,and (b) the data of interest are collected withoutmeasurement error, so that the only source of randomestimation error is the sampling variance, and the only(possible) source of systematic estimation error is theselection bias

• The expression “selection bias” is used here as a shorthandfor the sum of coverage bias, nonresponse bias, andsampling bias (Groves 1989)

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 8/49

Page 17: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

Estimating θ

• How do we get accurate – i.e., precise and unbiased –estimates of θ?

• For the sake of simplicity, let us suppose that (a)observations are sampled from a given target population,and (b) the data of interest are collected withoutmeasurement error, so that the only source of randomestimation error is the sampling variance, and the only(possible) source of systematic estimation error is theselection bias

• The expression “selection bias” is used here as a shorthandfor the sum of coverage bias, nonresponse bias, andsampling bias (Groves 1989)

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 8/49

Page 18: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

Estimating θ

• How do we get accurate – i.e., precise and unbiased –estimates of θ?

• For the sake of simplicity, let us suppose that (a)observations are sampled from a given target population,and (b) the data of interest are collected withoutmeasurement error, so that the only source of randomestimation error is the sampling variance, and the only(possible) source of systematic estimation error is theselection bias

• The expression “selection bias” is used here as a shorthandfor the sum of coverage bias, nonresponse bias, andsampling bias (Groves 1989)

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 8/49

Page 19: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

Estimating θ

• The standard (maximum likelihood) estimator of eachelement θk of θ is:

θ̂k ≡ ̂E(Y |G = k) =

nk∑i=1

Yi

nk

where nk denotes the number of valid sample observationswithin category k of variable G

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 9/49

Page 20: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

Estimating θ

• When nk is small, θ̂k tends to be very unprecise, i.e., togenerate highly variable estimates of θk

• The accuracy of θ̂k decreases further if the data object ofanalysis are affected by selection bias, i.e., if the validobservations are a nonrandom sample of the targetpopulation and the process of selection into the sample isassociated with one or more variables that are alsoassociated with variable Y

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 10/49

Page 21: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

Estimating θ

• When nk is small, θ̂k tends to be very unprecise, i.e., togenerate highly variable estimates of θk

• The accuracy of θ̂k decreases further if the data object ofanalysis are affected by selection bias, i.e., if the validobservations are a nonrandom sample of the targetpopulation and the process of selection into the sample isassociated with one or more variables that are alsoassociated with variable Y

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 10/49

Page 22: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

Here’s Mr. P

• For all those cases where the number of valid observationswithin one or more categories of G is small and/orcollected data are affected by selection bias, relativelyaccurate estimates of θ can be obtained by using a propercombination of multilevel regression modeling andpoststratification (henceforth MrP)

• This approach has been devised by Andrew Gelman andcolleagues (Gelman and Little 1997; Park, Gelman andBafumi 2004; Park, Gelman and Bafumi 2006; Gelman andHill 2007) and recently elaborated on by Kastellec, Lax andPhillips (Lax and Phillips 2009a; Lax and Phillips 2009b;Kastellec, Lax and Phillips 2010)

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 11/49

Page 23: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

Here’s Mr. P

• For all those cases where the number of valid observationswithin one or more categories of G is small and/orcollected data are affected by selection bias, relativelyaccurate estimates of θ can be obtained by using a propercombination of multilevel regression modeling andpoststratification (henceforth MrP)

• This approach has been devised by Andrew Gelman andcolleagues (Gelman and Little 1997; Park, Gelman andBafumi 2004; Park, Gelman and Bafumi 2006; Gelman andHill 2007) and recently elaborated on by Kastellec, Lax andPhillips (Lax and Phillips 2009a; Lax and Phillips 2009b;Kastellec, Lax and Phillips 2010)

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 11/49

Page 24: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

The MrP estimator

• The MrP estimator of θ – which we will denote by θ̃ – canbe described as a four-step procedure as follows:

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 12/49

Page 25: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

The MrP estimator

• First: Identify one or more covariates that might possiblybe responsible for selection bias. Without loss of generality,let C denote a discrete variable representing thecross-classification of these covariates.

In symbols:

C =

MC∏c=1

Vc

where Π is the Cartesian product operator; MC denotesthe total number of covariates forming C; and Vc denotesthe cth covariate.

We will refer to C as the composition variable

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 13/49

Page 26: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

The MrP estimator

• Second: Define the new estimand γ ≡ {γkl : k = 1, . . . ,K;l = 1, . . . , L}, where γkl ≡ E(Y |G = k,C = l); k indexesthe K categories of variable G as above; and l indexes theL categories of variable C

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 14/49

Page 27: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

The MrP estimator

• Third: Use a properly specified multilevel regressionmodel to estimate γ

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 15/49

Page 28: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

The MrP estimator

• Fourth: Compute the estimate of each element θk of θ asa weighted sum of the proper subset of γ̂:

θ̃k =

L∑l=1

γ̂klwl|k

where wl|k = Nkl/Nk; Nk denotes the number of membersof the target population who belong in category k ofvariable G; and Nkl denotes the number of members of thetarget population who belong in category k of variable Gand in category l of variable C

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 16/49

Page 29: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

The MrP estimator: Advantages

• The use of multilevel regression modeling (step 3 above)helps to increase precision

• If the composition variable C is carefully defined,poststratification (step 4 above) helps to decrease bias

• In sum, we expect MrP to be a relatively accurateestimator of θ

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 17/49

Page 30: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

The MrP estimator: Advantages

• The use of multilevel regression modeling (step 3 above)helps to increase precision

• If the composition variable C is carefully defined,poststratification (step 4 above) helps to decrease bias

• In sum, we expect MrP to be a relatively accurateestimator of θ

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 17/49

Page 31: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

The MrP estimator: Advantages

• The use of multilevel regression modeling (step 3 above)helps to increase precision

• If the composition variable C is carefully defined,poststratification (step 4 above) helps to decrease bias

• In sum, we expect MrP to be a relatively accurateestimator of θ

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 17/49

Page 32: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

The MrP estimator: Disadvantages

• We need to have population data – or, at least, asufficiently accurate estimate of it – for the full G× Ccross-classification; this might limit the definition of C

• To get good estimates of γ, the multilevel regression modelmust be specified very carefully – but this caveat applies toany kind of regression model

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 18/49

Page 33: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

The problemThe solution

The MrP estimator: Disadvantages

• We need to have population data – or, at least, asufficiently accurate estimate of it – for the full G× Ccross-classification; this might limit the definition of C

• To get good estimates of γ, the multilevel regression modelmust be specified very carefully – but this caveat applies toany kind of regression model

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 18/49

Page 34: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Stata command

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 19/49

Page 35: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

mrp – a Stata implementation of MrP

• mrp is a novel user-written Stata command thatimplements the MrP estimator outlined above

• Basically, mrp requests the user to specify (a) the targetvariable Y ; (b) the list of covariates forming the groupvariable G; (c) the list of covariates forming thecomposition variable C; (d) the multilevel regressioncommand appropriate to the problem at hand (e.g.,xtmixed); (e) the list of “fixed effects”; (f) the list of“random effects”; and (g) the name of a properly arrangeddataset contaning the population totals Nkl

• The basic output of mrp is an estimate of the K values ofthe regression function E(Y |G = k), i.e., of the K elementsof θ

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 20/49

Page 36: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

mrp – a Stata implementation of MrP

• mrp is a novel user-written Stata command thatimplements the MrP estimator outlined above

• Basically, mrp requests the user to specify (a) the targetvariable Y ; (b) the list of covariates forming the groupvariable G; (c) the list of covariates forming thecomposition variable C; (d) the multilevel regressioncommand appropriate to the problem at hand (e.g.,xtmixed); (e) the list of “fixed effects”; (f) the list of“random effects”; and (g) the name of a properly arrangeddataset contaning the population totals Nkl

• The basic output of mrp is an estimate of the K values ofthe regression function E(Y |G = k), i.e., of the K elementsof θ

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 20/49

Page 37: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

mrp – a Stata implementation of MrP

• mrp is a novel user-written Stata command thatimplements the MrP estimator outlined above

• Basically, mrp requests the user to specify (a) the targetvariable Y ; (b) the list of covariates forming the groupvariable G; (c) the list of covariates forming thecomposition variable C; (d) the multilevel regressioncommand appropriate to the problem at hand (e.g.,xtmixed); (e) the list of “fixed effects”; (f) the list of“random effects”; and (g) the name of a properly arrangeddataset contaning the population totals Nkl

• The basic output of mrp is an estimate of the K values ofthe regression function E(Y |G = k), i.e., of the K elementsof θ

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 20/49

Page 38: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example (based on simulation)

• Our objective is to describe the extent to which theproportion of Italian adults who attend Catholic Massregularly varies across Italian regions

• To this aim, a simple random sample of 2,000 units isdrawn from the target population (Italian men and womenaged 18+), and each sampled unit is contacted for interview

• Only 984 subjects accept to participate in the survey. Theresponse rate turns out to be higher among women andpositively correlated with age and educational level

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 21/49

Page 39: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example (based on simulation)

• Our objective is to describe the extent to which theproportion of Italian adults who attend Catholic Massregularly varies across Italian regions

• To this aim, a simple random sample of 2,000 units isdrawn from the target population (Italian men and womenaged 18+), and each sampled unit is contacted for interview

• Only 984 subjects accept to participate in the survey. Theresponse rate turns out to be higher among women andpositively correlated with age and educational level

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 21/49

Page 40: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example (based on simulation)

• Our objective is to describe the extent to which theproportion of Italian adults who attend Catholic Massregularly varies across Italian regions

• To this aim, a simple random sample of 2,000 units isdrawn from the target population (Italian men and womenaged 18+), and each sampled unit is contacted for interview

• Only 984 subjects accept to participate in the survey. Theresponse rate turns out to be higher among women andpositively correlated with age and educational level

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 21/49

Page 41: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example

• Since the number of valid observations within each region kis generally small (min(nk)=30, max(nk)=97), thestandard estimator of θ will be very unprecise

• Moreover, since sex, age, and educational level areassociated with Catholic Mass attendance, the standardestimator of θ will likely be affected by selection bias

• In an attempt to increase precision and decrease bias, weestimate θ using the new Stata command mrp

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 22/49

Page 42: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example

• Since the number of valid observations within each region kis generally small (min(nk)=30, max(nk)=97), thestandard estimator of θ will be very unprecise

• Moreover, since sex, age, and educational level areassociated with Catholic Mass attendance, the standardestimator of θ will likely be affected by selection bias

• In an attempt to increase precision and decrease bias, weestimate θ using the new Stata command mrp

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 22/49

Page 43: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example

• Since the number of valid observations within each region kis generally small (min(nk)=30, max(nk)=97), thestandard estimator of θ will be very unprecise

• Moreover, since sex, age, and educational level areassociated with Catholic Mass attendance, the standardestimator of θ will likely be affected by selection bias

• In an attempt to increase precision and decrease bias, weestimate θ using the new Stata command mrp

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 22/49

Page 44: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example: mrp specificationTarget variable

mrp church, g(region relmar|region) c(sex age edu) ///regcommand(xtmixed) binomial ///fe(relmar) re(i.age i.edu i.sex i.region) ///popref("PopRef.dta") npop(N) ///percent

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 23/49

Page 45: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example: mrp specificationList of covariates forming group variable G

mrp church, g(region relmar|region) c(sex age edu) ///regcommand(xtmixed) binomial ///fe(relmar) re(i.age i.edu i.sex i.region) ///popref("PopRef.dta") npop(N) ///percent

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 24/49

Page 46: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example: mrp specificationList of covariates forming composition variable C

mrp church, g(region relmar|region) c(sex age edu) ///regcommand(xtmixed) binomial ///fe(relmar) re(i.age i.edu i.sex i.region) ///popref("PopRef.dta") npop(N) ///percent

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 25/49

Page 47: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example: mrp specificationMultilevel regression command

mrp church, g(region relmar|region) c(sex age edu) ///regcommand(xtmixed) binomial ///fe(relmar) re(i.age i.edu i.sex i.region) ///popref("PopRef.dta") npop(N) ///percent

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 26/49

Page 48: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example: mrp specificationList of “fixed effects”

mrp church, g(region relmar|region) c(sex age edu) ///regcommand(xtmixed) binomial ///fe(relmar) re(i.age i.edu i.sex i.region) ///popref("PopRef.dta") npop(N) ///percent

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 27/49

Page 49: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example: mrp specificationList of “random effects”

mrp church, g(region relmar|region) c(sex age edu) ///regcommand(xtmixed) binomial ///fe(relmar) re(i.age i.edu i.sex i.region) ///popref("PopRef.dta") npop(N) ///percent

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 28/49

Page 50: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example: mrp specificationDataset and variable containing population totals Nkl

mrp church, g(region relmar|region) c(sex age edu) ///regcommand(xtmixed) binomial ///fe(relmar) re(i.age i.edu i.sex i.region) ///popref("PopRef.dta") npop(N) ///percent

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 29/49

Page 51: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example: mrp specificationScale option (converts proportions into percentages)

mrp church, g(region relmar|region) c(sex age edu) ///regcommand(xtmixed) binomial ///fe(relmar) re(i.age i.edu i.sex i.region) ///popref("PopRef.dta") npop(N) ///percent

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 30/49

Page 52: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Example: ResultsDot = True population value, S = Standard estimate, M = MrP estimate

SS

SS

SSS

SS

SSS

SS

SS

SS

S

MM

MM

MMM

MM

MM

MM

MM

MM

MM

PiemonteLombardia

Trentino-Alto AdigeVeneto

Friuli-Venezia GiuliaLiguria

Emilia-RomagnaToscanaUmbriaMarche

LazioAbruzzo

MoliseCampania

PugliaBasilicata

CalabriaSicilia

Sardegna0 10 20 30 40 50 60 70

E(Y | G = k)

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 31/49

Page 53: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Simulations

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 32/49

Page 54: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Quantity of interest

• We used Monte Carlo simulation to evaluate theperformance of three estimators of θ in a research settinganalogous to the one illustrated in the example above

• Our underlying research objective is to describe the extentto which the proportion of Italian adults who attendCatholic Mass regularly varies across 19 of the 20 regionsinto which Italy is subdivided (the 20th region, Valled’Aosta, is excluded from the analysis because of itspeculiarities)

• Thus, our quantity of interest θ corresponds to the K = 19values of the regression function E(Y |G = k), where thetarget variable Y is a binary indicator of regular CatholicMass attendance, and the group variable G is the region ofresidence

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 33/49

Page 55: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Quantity of interest

• We used Monte Carlo simulation to evaluate theperformance of three estimators of θ in a research settinganalogous to the one illustrated in the example above

• Our underlying research objective is to describe the extentto which the proportion of Italian adults who attendCatholic Mass regularly varies across 19 of the 20 regionsinto which Italy is subdivided (the 20th region, Valled’Aosta, is excluded from the analysis because of itspeculiarities)

• Thus, our quantity of interest θ corresponds to the K = 19values of the regression function E(Y |G = k), where thetarget variable Y is a binary indicator of regular CatholicMass attendance, and the group variable G is the region ofresidence

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 33/49

Page 56: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Quantity of interest

• We used Monte Carlo simulation to evaluate theperformance of three estimators of θ in a research settinganalogous to the one illustrated in the example above

• Our underlying research objective is to describe the extentto which the proportion of Italian adults who attendCatholic Mass regularly varies across 19 of the 20 regionsinto which Italy is subdivided (the 20th region, Valled’Aosta, is excluded from the analysis because of itspeculiarities)

• Thus, our quantity of interest θ corresponds to the K = 19values of the regression function E(Y |G = k), where thetarget variable Y is a binary indicator of regular CatholicMass attendance, and the group variable G is the region ofresidence

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 33/49

Page 57: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Procedure

• For each estimator, we followed a three-step procedure:

1 First, we simulated 1,000 sample surveys, using as thesampling frame a large dataset (N = 251, 708) that mimicsthe socio-demographic structure of the full Italian adultpopulation and contains complete information on thefollowing individual characteristics: region of residence(region), sex (sex), age (age), educational level (edu), andCatholic Mass attendance (church)

2 Second, we used the data collected in each simulated surveyto estimate the quantity of interest, thus getting a simulatedsampling distribution of θ made of 1,000 estimates

3 Finally, we evaluated the estimator in question bycomputing its bias, empirical standard error, and root meansquare error

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 34/49

Page 58: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Procedure

• For each estimator, we followed a three-step procedure:

1 First, we simulated 1,000 sample surveys, using as thesampling frame a large dataset (N = 251, 708) that mimicsthe socio-demographic structure of the full Italian adultpopulation and contains complete information on thefollowing individual characteristics: region of residence(region), sex (sex), age (age), educational level (edu), andCatholic Mass attendance (church)

2 Second, we used the data collected in each simulated surveyto estimate the quantity of interest, thus getting a simulatedsampling distribution of θ made of 1,000 estimates

3 Finally, we evaluated the estimator in question bycomputing its bias, empirical standard error, and root meansquare error

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 34/49

Page 59: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Procedure

• For each estimator, we followed a three-step procedure:

1 First, we simulated 1,000 sample surveys, using as thesampling frame a large dataset (N = 251, 708) that mimicsthe socio-demographic structure of the full Italian adultpopulation and contains complete information on thefollowing individual characteristics: region of residence(region), sex (sex), age (age), educational level (edu), andCatholic Mass attendance (church)

2 Second, we used the data collected in each simulated surveyto estimate the quantity of interest, thus getting a simulatedsampling distribution of θ made of 1,000 estimates

3 Finally, we evaluated the estimator in question bycomputing its bias, empirical standard error, and root meansquare error

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 34/49

Page 60: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Procedure

• For each estimator, we followed a three-step procedure:

1 First, we simulated 1,000 sample surveys, using as thesampling frame a large dataset (N = 251, 708) that mimicsthe socio-demographic structure of the full Italian adultpopulation and contains complete information on thefollowing individual characteristics: region of residence(region), sex (sex), age (age), educational level (edu), andCatholic Mass attendance (church)

2 Second, we used the data collected in each simulated surveyto estimate the quantity of interest, thus getting a simulatedsampling distribution of θ made of 1,000 estimates

3 Finally, we evaluated the estimator in question bycomputing its bias, empirical standard error, and root meansquare error

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 34/49

Page 61: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Survey specifications

• Sampling method: Simple random sampling

• Initial sample size: n = 2, 000

• Response rate: Each sampled unit is selected into the finalsample with a probability determined by his/her sex, age,and educational level. Such probabilities range from aminimum of 20% (poorly-educated men aged 18-44) to amaximum of 100 % (highly-educated women aged 65+)

• Final sample size: mean(n)=970, min(n)=897,max(n)=1,035

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 35/49

Page 62: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Survey specifications

• Sampling method: Simple random sampling

• Initial sample size: n = 2, 000

• Response rate: Each sampled unit is selected into the finalsample with a probability determined by his/her sex, age,and educational level. Such probabilities range from aminimum of 20% (poorly-educated men aged 18-44) to amaximum of 100 % (highly-educated women aged 65+)

• Final sample size: mean(n)=970, min(n)=897,max(n)=1,035

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 35/49

Page 63: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Survey specifications

• Sampling method: Simple random sampling

• Initial sample size: n = 2, 000

• Response rate: Each sampled unit is selected into the finalsample with a probability determined by his/her sex, age,and educational level. Such probabilities range from aminimum of 20% (poorly-educated men aged 18-44) to amaximum of 100 % (highly-educated women aged 65+)

• Final sample size: mean(n)=970, min(n)=897,max(n)=1,035

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 35/49

Page 64: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Survey specifications

• Sampling method: Simple random sampling

• Initial sample size: n = 2, 000

• Response rate: Each sampled unit is selected into the finalsample with a probability determined by his/her sex, age,and educational level. Such probabilities range from aminimum of 20% (poorly-educated men aged 18-44) to amaximum of 100 % (highly-educated women aged 65+)

• Final sample size: mean(n)=970, min(n)=897,max(n)=1,035

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 35/49

Page 65: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Estimator 1Standard (std)

• The standard estimator of each element θk of θ is definedas follows:

θ̂k =

nk∑i=1

churchi

nk

where churchi takes value 1 when subject i attendsCatholic Mass regularly, value 0 otherwise; and nk denotesthe number of valid sample observations within region ofresidence k

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 36/49

Page 66: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Estimator 2Multilevel Regression with Poststratification (mrp)

• The MrP estimator of each element θk of θ is defined asfollows:

θ̃k =

L∑l=1

γ̂klwl|k

where all symbols are defined as in slides 14-16 above

• The estimation of parameters γkl requires that thecomposition variable C be previously defined

• In our case, we define C as the cross-classification of threecategorical covariates: sex (2 levels), age (4 levels), andedu (3 levels). Therefore, L = 2× 4× 3 = 24

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 37/49

Page 67: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Estimator 2Multilevel Regression with Poststratification (mrp)

• The MrP estimator of each element θk of θ is defined asfollows:

θ̃k =

L∑l=1

γ̂klwl|k

where all symbols are defined as in slides 14-16 above

• The estimation of parameters γkl requires that thecomposition variable C be previously defined

• In our case, we define C as the cross-classification of threecategorical covariates: sex (2 levels), age (4 levels), andedu (3 levels). Therefore, L = 2× 4× 3 = 24

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 37/49

Page 68: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Estimator 2Multilevel Regression with Poststratification (mrp)

• The MrP estimator of each element θk of θ is defined asfollows:

θ̃k =

L∑l=1

γ̂klwl|k

where all symbols are defined as in slides 14-16 above

• The estimation of parameters γkl requires that thecomposition variable C be previously defined

• In our case, we define C as the cross-classification of threecategorical covariates: sex (2 levels), age (4 levels), andedu (3 levels). Therefore, L = 2× 4× 3 = 24

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 37/49

Page 69: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Estimator 2Multilevel Regression with Poststratification (mrp)

• Given the definition of composition variable C, theparameters γkl are estimated using the following multilevelregression model:

γkl = β0 + αregionk + αsexr[l] + α

age

s[l] + αedut[l]

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 38/49

Page 70: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Estimator 2Multilevel Regression with Poststratification (mrp)

where

αregionk ∼ N(βrelmar · relmar, σ2

region) for k = 1, . . . , 19

αsexr ∼ N(0, σ2sex) for r = 1, . . . , 2

αages ∼ N(0, σ2age) for s = 1, . . . , 4

αedut ∼ N(0, σ2edu) for t = 1, . . . , 3

and relmar is a region-level variable that expresses thepercentage of religious marriages in each region

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 39/49

Page 71: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Estimator 3Standard Regression with Poststratification (srp)

• The SrP estimator of each element θk of θ is defined asfollows:

θ̈k =

L∑l=1

γ̂klwl|k

where all symbols are defined as above

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 40/49

Page 72: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Estimator 3Standard Regression with Poststratification (srp)

• The SrP estimator has the same general form as the MrPestimator, but in the SrP estimator the parameters γkl areestimated using a standard logistic regression model asfollows:

γkl = invlogit(β0 + βrelmar · relmar + βregionk · regionk+

+ βsexr · sexr[l] + βages · ages[l] + βedut · edut[l])

where βregion1 = β

region2 = βsex1 = β

age1 = βedu1 = 0

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 41/49

Page 73: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Results: Bias

stdstd

stdstd

stdstd

stdstdstd

stdstd

stdstd

stdstdstdstd

stdstd

srpsrp

srpsrp

srpsrpsrpsrpsrp

srpsrp

srpsrp

srpsrp

srpsrp

srpsrp

mrpmrp

mrpmrp

mrpmrpmrp

mrpmrp

mrpmrp

mrpmrp

mrpmrp

mrpmrp

mrpmrp

33%

39%

50%

44%

29%

26%

24%

25%

30%

45%

31%

35%

40%

46%

43%

36%

40%

41%

31%

!k

PiemonteLombardia

Trentino-Alto AdigeVeneto

Friuli-Venezia GiuliaLiguria

Emilia-RomagnaToscanaUmbriaMarche

LazioAbruzzo

MoliseCampania

PugliaBasilicata

CalabriaSicilia

Sardegna-3 -2 -1 0 1 2 3 4 5 6

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 42/49

Page 74: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Results: Empirical standard error

stdstd

stdstd

stdstd

stdstd

stdstd

stdstd

stdstd

stdstd

stdstd

std

srpsrp

srpsrp

srpsrp

srpsrp

srpsrp

srpsrp

srpsrp

srpsrp

srpsrp

srp

mrpmrp

mrpmrpmrpmrpmrp

mrpmrpmrp

mrpmrpmrp

mrpmrp

mrpmrpmrp

mrp

33%

39%

50%

44%

29%

26%

24%

25%

30%

45%

31%

35%

40%

46%

43%

36%

40%

41%

31%

!k

PiemonteLombardia

Trentino-Alto AdigeVeneto

Friuli-Venezia GiuliaLiguria

Emilia-RomagnaToscanaUmbriaMarche

LazioAbruzzo

MoliseCampania

PugliaBasilicata

CalabriaSicilia

Sardegna0 1 2 3 4 5 6 7 8 9 10

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 43/49

Page 75: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Results: Root mean square error

stdstd

stdstd

stdstd

stdstd

stdstd

stdstd

stdstd

stdstd

stdstd

std

srpsrp

srpsrp

srpsrp

srpsrp

srpsrp

srpsrp

srpsrp

srpsrp

srpsrp

srp

mrpmrp

mrpmrp

mrpmrpmrp

mrpmrpmrp

mrpmrp

mrpmrp

mrpmrp

mrpmrp

mrp

33%

39%

50%

44%

29%

26%

24%

25%

30%

45%

31%

35%

40%

46%

43%

36%

40%

41%

31%

!k

PiemonteLombardia

Trentino-Alto AdigeVeneto

Friuli-Venezia GiuliaLiguria

Emilia-RomagnaToscanaUmbriaMarche

LazioAbruzzo

MoliseCampania

PugliaBasilicata

CalabriaSicilia

Sardegna0 1 2 3 4 5 6 7 8 9 10 11

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 44/49

Page 76: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Results: Summary

• Bias: In absolute terms, the MrP estimator exhibits littlebias – in most cases less than one percentage point, for anaverage true value of 36%. Comparatively, it exhibitssignificantly less bias than the standard estimator andslightly more bias than the SrP estimator

• Precision: The MrP estimator is significantly moreprecise (i.e., less variable) than both the standardestimator and the SrP estimator

• Accuracy: Combining bias and precision, we can concludethat the MrP estimator is 1 to 4 times more accurate thanthe standard estimator and 1 to 3 times more accuratethan the SrP estimator

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 45/49

Page 77: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Results: Summary

• Bias: In absolute terms, the MrP estimator exhibits littlebias – in most cases less than one percentage point, for anaverage true value of 36%. Comparatively, it exhibitssignificantly less bias than the standard estimator andslightly more bias than the SrP estimator

• Precision: The MrP estimator is significantly moreprecise (i.e., less variable) than both the standardestimator and the SrP estimator

• Accuracy: Combining bias and precision, we can concludethat the MrP estimator is 1 to 4 times more accurate thanthe standard estimator and 1 to 3 times more accuratethan the SrP estimator

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 45/49

Page 78: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Results: Summary

• Bias: In absolute terms, the MrP estimator exhibits littlebias – in most cases less than one percentage point, for anaverage true value of 36%. Comparatively, it exhibitssignificantly less bias than the standard estimator andslightly more bias than the SrP estimator

• Precision: The MrP estimator is significantly moreprecise (i.e., less variable) than both the standardestimator and the SrP estimator

• Accuracy: Combining bias and precision, we can concludethat the MrP estimator is 1 to 4 times more accurate thanthe standard estimator and 1 to 3 times more accuratethan the SrP estimator

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 45/49

Page 79: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Conclusion

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 46/49

Page 80: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Conclusion

• mrp is still at alpha stage and it will take a few monthsbefore it reaches a publishable form

• mrp is part of a larger project on the analysis of variationin Stata, and eventually it will be subsumed under a moregeneral command for the analysis of association

• Part of the work presented here was carried out whileMaurizio Pisati was a visiting scholar at the Institute forQuantitative Social Science at Harvard University, andValeria Glorioso was a visiting student researcher at theDepartment of Society, Human Development, and Healthof the Harvard School of Public Health

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 47/49

Page 81: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Conclusion

• mrp is still at alpha stage and it will take a few monthsbefore it reaches a publishable form

• mrp is part of a larger project on the analysis of variationin Stata, and eventually it will be subsumed under a moregeneral command for the analysis of association

• Part of the work presented here was carried out whileMaurizio Pisati was a visiting scholar at the Institute forQuantitative Social Science at Harvard University, andValeria Glorioso was a visiting student researcher at theDepartment of Society, Human Development, and Healthof the Harvard School of Public Health

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 47/49

Page 82: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

Conclusion

• mrp is still at alpha stage and it will take a few monthsbefore it reaches a publishable form

• mrp is part of a larger project on the analysis of variationin Stata, and eventually it will be subsumed under a moregeneral command for the analysis of association

• Part of the work presented here was carried out whileMaurizio Pisati was a visiting scholar at the Institute forQuantitative Social Science at Harvard University, andValeria Glorioso was a visiting student researcher at theDepartment of Society, Human Development, and Healthof the Harvard School of Public Health

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 47/49

Page 83: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

References

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 48/49

Page 84: Multilevel Regression and Poststratification in Stata › meeting › chicago11 › materials › chi11_pisat… · Multilevel Regression and Poststrati cation in Stata Maurizio Pisati1

IntroductionStata command

SimulationsConclusionReferences

References

• Gelman, A. and J. Hill. 2007. Data Analysis Using Regression andMultilevel/Hierarchical Models. Cambridge: Cambridge University Press.

• Gelman, A. and T.C. Little. 1997. Poststratification into many categoriesusing hierarchical logistic regression. Survey Methodology 23: 127–135.

• Groves, R.M. 1989. Survey Errors and Survey Costs. New York: Wiley.

• Kastellec, J., Lax, J.R. and J.H. Phillips. 2010. Public opinion and Senateconfirmation of Supreme Court nominees. Journal of Politics 72: 767–784.

• Lax, J.R. and J.H. Phillips. 2009a. How should we estimate public opinionin the States?. American Journal of Political Science 53: 107–121.

• Lax, J.R. and J.H. Phillips. 2009b. Gay rights in the States: Public opinionand policy responsiveness. American Political Science Review 103: 367–386.

• Park, D.K., Gelman, A. and J. Bafumi. 2004. Bayesian multilevelestimation with poststratification: State-level estimates from national polls.Political Analysis 12: 375–385.

• Park, D.K., Gelman, A. and J. Bafumi. 2006. State level opinions fromnational surveys: Poststratification using multilevel logistic regression. InPublic Opinion in State Politics. Ed. J.E. Cohen. Stanford, CA: StanfordUniversity Press, 209–228.

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 49/49