limits of data disaggregation in household surveys for population … · 2019. 10. 11. · limits...

23
INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS Limits of data disaggregation in household surveys for population subgroups and geographical areas and the requirements to overcome them Application to poverty mapping in Palestine Isabel Molina and Eduardo Garc´ ıa Dept. of Statistics, Univ. Carlos III de Madrid

Upload: others

Post on 08-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

Limits of data disaggregation in household surveysfor population subgroups and geographical areas and

the requirements to overcome themApplication to poverty mapping in Palestine

Isabel Molina and Eduardo GarcıaDept. of Statistics, Univ. Carlos III de Madrid

Page 2: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

Palestinian Expenditure Comsumption Survey(PECS) 2016/17

• Sample size: n = 18, 363 persons out of N = 4, 266, 953(43 out of 10,000)

• Sample sizes of regions by gender are fine:

Gaza West Bank Total

Women 2569 6550 9119

Men 2578 6666 9244

Total 5147 13216 18363

2

Page 3: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

Palestinian Expenditure Comsumption Survey(PECS) 2016/17

• What if we wish to estimate at local level?

• 315 localities in census: 162 in PECS, 157 unsampled.

• Sample sizes localities by gender:

Min Q1 Median Mean Q3 Max

Women 14 26 35 56.29 61.5 405

Men 13 28 36 57.06 63 464

3

Page 4: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

BASIC TERMINOLOGY

• Areas/domains: Subpopulations of interest.

• Direct estimator: Based only on the survey data from thetarget area/domain.

• Small area: Area/domain for which the considered directestimator of the target indicator is inefficient (too largesampling error).

4

Page 5: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

INDIRECT ESTIMATION

• The idea is borrowing strength from the other areas.

• Use auxiliary data sources (census or other, ideally a census)that contains some variables related to our target variable andobserved also in the survey.

• Consider that the target variable is related with the auxiliaryvariables similarly for all the areas (regression model).

• Include (random) area effects to account for unexplainedbetween-area heterogeneity (mixed regression model).

5

Page 6: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

MODEL-BASED ESTIMATION

• Fit the model with the survey data from all the areas.

• Total survey sample size is typically large, so borrowing a lotof strength.

• Use the fitted model to estimate in the small areas.

• Efficiency gains are typically great.

6

Page 7: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

POVERTY INDICATORS

• D areas/domains (d = 1, . . . ,D) of sizes N1, . . . ,ND .

• Edj welfare measure for indiv. j in domain d .

• z = poverty line.

• FGT poverty indicator of order α for domain d :

Fαd =1

Nd

Nd∑j=1

(z − Edj

z

I (Edj < z), α ≥ 0.

• When α = 0⇒ Poverty rate (or at-risk-of-poverty rate)

• When α = 1⇒ Poverty gap

X Foster, Greer & Thornbecke (1984), Econom.7

Page 8: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

NESTED ERROR MODEL

• In our application, as welfare measure of individuals we usehousehold expenditure per adult equivalent.

• The distribution of expenditures Edj is highly right skewed.

• We need to transform expenditures so that the distribution isapproximately Normal: ydj = log(Edj + k)

• We consider a nested error model for ydj :

ydj = x′djβ + ud + edj , j = 1, . . . ,Nd , d = 1, . . . ,D

udiid∼ N(0, σ2u), edj

iid∼ N(0, σ2e )

• We obtain the empirical best (EB) predictor of the targetindicator for each area of interest.(X Molina & Rao (2010), CJS)

8

Page 9: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

DATA DESCRIPTION

• Data: Palestinian Expenditure Consumption Survey (PECS)from 2016/2017 and Population Census from 2017.

• Target: Estimate poverty rates and gaps for Palestinianlocalities by gender.

• Areas: In census, 319 localities→ D = 162 in survey.We compute estimates for each sampled locality by gender.

• Welfare measure: Edj monthly expenditure per adultequivalent (ILS).

• Poverty line: z = 10, 027 ILS → approx. 26 % popn. belowpov. line.

9

Page 10: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

FITTED MODEL

• We fit a separate model for each gender.• Explanatory variables:

X Indicators of region (Gaza, West Bank), type of locality(rural/urban, camp).

X Household characteristics (size, prop. females, employed ratio).X Household head characteristics (unemployed, employisrasett,

employnatgov, refugstat, diff, neverschool, secondabove).X Dwelling characteristics (type, tenure, num. rooms).X Supplies (water, waste, heating systems, freezer, etc.)

10

Page 11: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

MODEL CHECKING

• Model coefficients take reasonable signs.

• All covariates with significant categories for both genders.

• Explanatory power: R2 = 53.6 %, both genders.

• Data indicates nothing against normality of model residuals,linearity, heteroscedasticity. Model seems to fit well.

11

Page 12: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

COMPARISON BY REGION

X Median Pov. Rate: Gaza 55 %, West Bank: 8.3 %X Median Pov. Gap: Gaza 17.4 %, West Bank: 1.5 %

Poverty Rate●

●●

●●

Gaza West Bank

020

4060

80

Region

EB

Est

imat

es P

over

ty R

ate

(%)

Poverty Gap●

●●●

●●

●●

Gaza West Bank

05

1015

2025

30

Region

EB

Est

imat

es P

over

ty G

ap (

%)

12

Page 13: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

QUALITY EB vs. DIRECT: POV. RATEX Median MSE Women: Direct 47, EB: 6.7X Median MSE Men: Direct 45.8, EB: 5.5

MSE: Women

●●

●●●●●●●●●●●●●●●

010

020

030

040

050

060

0

Direct EB

MSE: Men

●●●●

●●

●●●●●●●●●●●●●●●

020

040

060

080

010

00

Direct EB

13

Page 14: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

MSE EB vs. DIRECT POV. RATE: WEST BANKX Reduction in all but one locality, 84 % average MSE reduction!

020

040

060

0

Locality (sample size in labels)

Estim

ated

MSE

Pov

. Rat

e

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●●●●

●●●

● mse(Weighted Direct) mse(EB)

14 18 22 23 25 27 29 31 34 38 44 52 60 76 104

14

Page 15: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

MSE EB vs. DIRECT POV. RATE: GAZA0

100

200

300

400

500

600

Locality (sample size in labels)

Estim

ated

MSE

Pov

. Rat

e

● ●

●●

●●

● ●●

●●

● mse(Weighted Direct) mse(EB)

21 29 32 39 53 57 64 71 95 106 123 162 319

X Great gains also for Pov. Gap (not shown)! 15

Page 16: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

ESTIMATED POV. RATE: WOMEN

West Bank

02

04

06

08

0

Locality (sample size in labels)

Est

ima

tes

Pov

. R

ate

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● Weighted Direct EB

14 18 22 23 25 27 29 31 34 38 44 52 60 76 104

Gaza

30

40

50

60

70

80

Locality (samp. size in labels)

Est

ima

tes

Pov

. R

ate

● ●

●●

● Weighted Direct EB

21 29 32 39 53 57 64 71 95 106 123 162 319

16

Page 17: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

EB POV. RATE BY GENDER: WEST BANK0

1020

3040

50

Locality (sample size in labels)

Estim

ates

Pov

. Rat

e

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● Women Men

14 18 22 23 25 27 29 31 34 38 44 52 60 76 104

17

Page 18: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

EB POV. RATE BY GENDER: GAZA30

4050

6070

80

Locality (sample size in labels)

Estim

ates

Pov

. Rat

e

● ●

●●

●●

● ●

●● ●

● Women Men

21 29 32 39 53 57 64 71 95 106 123 162 319

18

Page 19: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

EB POV. GAP BY GENDER: WEST BANK0

24

68

1012

14

Locality (sample size in labels)

Estim

ates

Pov

. Gap

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●●

● Women Men

14 18 22 23 25 27 29 31 34 38 44 52 60 76 104

19

Page 20: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

EB POV. GAP BY GENDER: GAZA10

1520

2530

35

Locality (sample size in labels)

Estim

ates

Pov

. Gap

● ●

●●

●●

● ● ●●

● Women Men

21 29 32 39 53 57 64 71 95 106 123 162 319

20

Page 21: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

CONCLUSIONS

• The use of census data in a model allows us to obtain muchmore efficient estimates.

• The considered model-based methodology allows todisaggregate at any desired level.

• We can estimate whatever indicator that is function ofexpenditure.

• The considered model fits rather well these data.

• The efficiency gains of model-based estimators with respectto direct estimators are notorious (over 82 % reduction inMSE for pov. rates and gaps).

21

Page 22: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

CONCLUSIONS

• Direct estimates equal to zero for many localities (32 forMen, 29 for Women) and highly unstable.

• EB estimates never zero and much more stable. Perhapssome underestimation in few localities, model variations canbe further explored.

• Gaza has much larger pov. rates and gaps. Perhaps using adifferent pov. line.

• No great differences between men and women, althoughwomen with slightly greater estimates for about 70 % oflocalities in West Bank.

22

Page 23: Limits of data disaggregation in household surveys for population … · 2019. 10. 11. · Limits of data disaggregation in household surveys for population subgroups and geographical

INTRODUCTION INDIRECT ESTIMATION APPLICATION CONCLUSIONS

X MANY THANKS TO UN-ESCWA AND PCBS FORGREAT DATA PREPARATION!

X THANK YOU ALL FOR YOUR ATTENTION!

23