info 4470/ilrle 4470 visualization tools and data quality john m. abowd and lars vilhuber march 16,...
DESCRIPTION
Coverage What is the target (theoretical) population? Does the statistical frame fully cover this population? What are the known deficiencies in the frame? Can these be addressed with other data? 3/16/2011 (c) John M. Abowd and Lars Vilhuber 2011, all rights reserved 3TRANSCRIPT
![Page 1: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/1.jpg)
INFO 4470/ILRLE 4470 Visualization Tools and Data
Quality
John M. Abowd and Lars VilhuberMarch 16, 2011
![Page 2: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/2.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
2
Outline
• Data quality issues• Coverage• Known biases• Missing data• Extended example using the Quarterly
Workforce Indicators
3/16/2011
![Page 3: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/3.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
3
Coverage
• What is the target (theoretical) population?• Does the statistical frame fully cover this
population?• What are the known deficiencies in the
frame?• Can these be addressed with other data?
3/16/2011
![Page 4: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/4.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
4
Biases and Sampling Error
• Bias: the statistical model does not estimate the quantity of interest in expected value
• Sampling error: the statistical model has imprecision in estimating the quantity of interest
• Non-sampling error: variation in the statistical model introduced by editing, imputation, non-random non-response, etc.
3/16/2011
![Page 5: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/5.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
5
What are Missing Data?
• Random or systematic missing items or entities in a statistical sample or frame
• One of the most ubiquitous data quality problems known
• Cannot be ignored, or rather
3/16/2011
![Page 6: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/6.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
6
How to code missing data?
• How about giving a zero, “0” for missing?• Note with a distinctive value such as:– -9 for non-responses– -8 for invalid responses– Or even “.”
• Statistical software like SAS, Stata and SPSS can then recognize those values as missing and treat them according to your instructions
3/16/2011
![Page 7: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/7.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
7
How to Handle Missing Data
• It depends on the type of analysis:– Simple summary tabulations– Descriptive stats, including mean and standard
deviation– Correlation, relation between two or more
variables– Model to estimate the effects of characteristics
3/16/2011
![Page 8: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/8.jpg)
Missing vs Non-missing
"There is no difference between cases with missing versus observed
data"
![Page 9: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/9.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
9
“Inference and Missing Data” Rubin (1976)
• MCAR – Missing Completely At Random– Rigorous and hard to achieve– Sometimes by design
• MAR – Missing At Random– Less Rigorous– Cannot be tested without the data which are
missing
3/16/2011
![Page 10: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/10.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
10
Assume MAR?
• YES – then mechanism by which data are missing is IGNORABLE
• NO – then mechanism is not ignorable, requires more involved procedures
3/16/2011
![Page 11: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/11.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
11
MAR - YES
Delete• Listwise
• Pairwise
Impute• Assign Mean
• Hot Deck
• Maximum Likelihood
• Multiple Imputation
3/16/2011
![Page 12: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/12.jpg)
Visualization of Missing Data in the Quarterly Workforce Indicators
![Page 13: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/13.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
13
Data Sources (Quarterly Only)
• Quarterly Workforce Indicators (Census Bureau)
• Quarterly Census of Employment and Wages (BLS)
• Business Employment Dynamics (BLS)• Job Openings and Labor Turnover Survey• Adjustments by to JOLTS by Davis, Faberman,
Haltiwanger, and Rucker (2010)
3/16/2011
![Page 14: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/14.jpg)
Quarterly Workforce Indicators I
• Flows are based on longitudinally linked (by employer and employee) Unemployment Insurance Wage Records
• Beginning-of-quarter employed if wage record with earnings > $1.00 in quarters t-1 and t (B)
• End-of-quarter employed if wage record with earnings > $1.00 in quarters t and t+1 (E)
• Accession if wage record in t but not t-1 (A)• Separation if wage record in t but not t+1 (S)
![Page 15: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/15.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
15
Quarterly Workforce Indicators II
• Job creation if establishment has positive employment change from beginning to end of quarter (JC)
• Job destruction if establishment has negative employment change from beginning to end of quarter (JD), always stated as absolute value of change
• Demographic (age x gender), geographic (county, CBSA, WIB), NAICS (sector, sub-sector, industry group), and ownership (All, private)3/16/2011
![Page 16: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/16.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
16
Quarterly Census of Employment and Wages
• Stocks of employment measured as of the 12th day of the month for each month in the quarter for each establishment (no job level data)
• BLS uses month-3 employment to measure changes• Beginning of quarter employment is month-3
employment from quarter t-1• End of quarter employment is month-3 employment
for quarter t• Geographic (county, MSA), NAICS (sector, sub-sector,
industry group), and ownership (all, public, private)
3/16/2011
![Page 17: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/17.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
17
Business Employment Dynamics
• Gross job gains (job creations) is the change in employment at an establishment between month-3 in quarter t-1 and month-3 in quarter t, if positive
• Gross job losses (job destructions) is the absolute value of the change in employment at an establishment between month-3 in quarter t-1 and month-3 in quarter t, if negative
• Limited geography (state), NAICS (sector) detail3/16/2011
![Page 18: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/18.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
18
Job Openings and Labor Turnover Survey
• Monthly survey of continuing establishments• Accessions measured as all new employment over
the course of the month (summed over the three months in the quarter here)
• Separations measured as quits, layoffs, discharges, and other over the course of the month (summed over the three months of the quarter here)
• Limited geography (national) and NAICS (sector) detail
3/16/2011
![Page 19: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/19.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
19
QWI Coverage of the Private Workforce
Sub-period 1
Sub-period 2
3/16/2011
![Page 20: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/20.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
20
Worker Reallocation Rate
• This rate is available in the QWI for 8 age groups, both genders, NAICS sector, state and quarter from 1990:Q1 to 2008:Q4 (more detail is available than we used)
2agsktagskt
agsktagsktagskt EB
SAWRR
3/16/2011
![Page 21: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/21.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
21
Job Reallocation Rate
• This rate is available in the QWI for 8 age groups, both genders, NAICS sector, state and quarter from 1990:Q1 to 2008:Q4 (more detail is available than we used)
2agsktagskt
agsktagsktagskt EB
JDJCJRR
3/16/2011
![Page 22: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/22.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
22
Excess Reallocation Rate
• ERR = WRR – JRR• The excess reallocation rate measures the extent
to which gross worker flows exceed the minimum required to service the gross job flows
• This has been very difficult to estimate nationally because there were no data collected on a consistent basis for all the component flows
• QWI solved that problem
3/16/2011
![Page 23: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/23.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
23
Statistical Methodology
• Divide the analysis into two periods– 1993:Q1-2001:Q4 (early period, many states are completely
missing, 10 states complete)– 1999:Q1-2008:Q4 (later period, 37 states are complete)
• For each sub-period use a multiple imputation model to complete the missing data
• For the overlap period, use a ramped weight to compute the average implicate combining the two periods
• Use the standard multiple imputation formulae to combine implicates
3/16/2011
![Page 24: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/24.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
24
National Estimates
• The combining formula for producing the national WRR is shown above (similar formulae apply to other rates)
WRRag+kt = 1MP M
`=1
2664P
8s
µB ( `)agsk t +E ( `)
agsk t2
¶WRR ( `)
agsk t
P8v
µB ( `)agvk t +E ( `)
agvk t2
¶
3775
agkstsk
agkstagkst
agtagt
agtagt
agtagtagt
WRREB
EB
EBSA
WRR
, 22
2
3/16/2011
![Page 25: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/25.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
25
Implicate Combining Formulae I
V (`)hdWRRag+kt
i= 1
49P
8s
µB (`)agsk t +E (`)
agsk t2
¶³WRR (`)
agsk t ¡ dWRR ( `)ag+k t
´ 2
P8v
µB (`)agvk t +E ( `)
agvk t2
¶ :
B £WRRag+kt¤= 1
M ¡ 1P M
`=1
µdWRR (`)
ag+kt ¡ WRRag+kt
¶2
M
s ktagktag
agsktagsktagskt
agktEB
WRREB
MWRR
1
2
21
s ktagktag
agsktagsktagskt
agkt EB
WRREB
RRW
2
2
3/16/2011
![Page 26: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/26.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
26
Implicate Combining Formulae II
T £WRRag+kt¤= 1
MP M
`=1V (`)hdWRRag+kt
i+ M +1
M B £WRRag+kt¤
MR £WRRag+kt¤= B [WRR ag+k t ]
T [WRR ag+k t ]
df £WRRag+kt¤= (M ¡ 1)
µ1+ 1
M +11MP M
`=1 V( `)£dWRR ag+k t
¤B [WRR ag+k t ]
¶2:
s ktagktag
agktagsktagsktagskt
agkt EB
RRWWRREB
RRWV
2
2491
2
M
agktagktagkt WRRRRWM
WRRB1
2
11
agkt
M
agktagkt WRRBMMRRWV
MWRRT
1
11
3/16/2011
![Page 27: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/27.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
27
Implicate Combining Formulae III
T £WRRag+kt¤= 1
MP M
`=1V (`)hdWRRag+kt
i+ M +1
M B £WRRag+kt¤
MR £WRRag+kt¤= B [WRR ag+k t ]
T [WRR ag+k t ]
df £WRRag+kt¤= (M ¡ 1)
µ1+ 1
M +11MP M
`=1 V( `)£dWRR ag+k t
¤B [WRR ag+k t ]
¶2:
agkt
agktagkt
WRRTWRRBWRRMR
3/16/2011
![Page 28: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/28.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
28
Results
• Comparison of WRR between QWI and JOLTS• Comparison of JRR between QWI and BED• Demographic detail• Industry detail
3/16/2011
![Page 29: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/29.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
29
WRR: QWI v. JOLTS
3/16/2011
![Page 30: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/30.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
30
WRR: QWI v. Adjusted JOLTS
3/16/2011
![Page 31: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/31.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
31
JRR: QWI v. BED
3/16/2011
![Page 32: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/32.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
32
JRR WRR ERR by Gender
3/16/2011
![Page 33: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/33.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
33
ERR (Churning) by Age Group
3/16/2011
![Page 34: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/34.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
34
Job Creation Rate
3/16/2011
![Page 35: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/35.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
35
Job Destruction Rate
3/16/2011
![Page 36: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/36.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
36
Job Reallocation Rate
3/16/2011
![Page 37: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/37.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
37
Accession Rate
3/16/2011
![Page 38: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/38.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
38
Separation Rate
3/16/2011
![Page 39: INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011](https://reader030.vdocuments.mx/reader030/viewer/2022020921/5a4d1b6c7f8b9ab0599b3a4f/html5/thumbnails/39.jpg)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
39
Worker Reallocation Rate
3/16/2011