2014 planning database (pdb)
TRANSCRIPT
![Page 1: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/1.jpg)
What is the Planning Database (PDB), and How
Can I Use it?
April 7, 2016 Nancy Bates, Kathleen Kephart,
Suzanne McArdle Center for Survey Measurement
1
![Page 2: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/2.jpg)
Acknowledgements Thank you to Travis Pape and Julia Coombs for
creating the code to generate the PDB Luke Larsen and Alina Kline for their work on
the upcoming 2016 PDB Nancy Bates and Barb O’hare for their time
and effort to bring the PDB back Suzanne McArdle for her work on PDB data
visualizations
2
![Page 3: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/3.jpg)
Overview A “greatest hits” of ACS 5 year estimates and
2010 Census variables Pulls together publicly available estimates in
one convenient file Available at two levels of geography: Tract and
Block Group Publicly available in CSV and now API format
3
![Page 4: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/4.jpg)
Background First PDB developed for 2000 Census planning Selected 1990 Census tract data in easy-to-use format Hard-to-Count Score
ACS annual 5-year estimates for block groups resulted in revised PDB in 2012
2015 PDB Latest 5-year ACS estimates Health Insurance Coverage Estimates An API version of the data for developers
4
![Page 5: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/5.jpg)
Contents of the 2015 PDB
Both 2009-2013 5-year ACS estimates and 2010 census data Types of variables Population: gender, age, education, poverty Household: language, relationship, income Housing unit: tenure, number of units Census operational: mailout/mailback, bilingual
5
![Page 6: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/6.jpg)
A Broad Scope of Uses Useful for: Identifying areas with likely low survey
response rates Stratifying small areas Creating thematic maps Enhancing reports with population metrics Creating applications
6
![Page 7: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/7.jpg)
Access
Available on the Census Bureau’s Research @ Census page
Link to the PDB CSV format:
http://www.census.gov/research/data/planning_database/ API format: www.census.gov/developers Documentation describing the files in PDF
format
7
![Page 8: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/8.jpg)
Navigation to the PDB CSV Format
From the Census Bureau internet site (http://www.census.gov): 1. Select “Our Research” from under the “About the
Bureau” menu at the top of the page 2. Select the “Data” tab 3. Select the “Research Data Products” link 4. Select “Planning Database” under the “Demographic –
People and Households” heading 5. Select the appropriate year under “Data and
Documentation”
8
![Page 9: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/9.jpg)
Navigation to the PDB API Format
From the Census Bureau internet site (http://www.census.gov):
1. Select “Data” 2. Select “Developers”
3. Select “Available APIs” from the sidebar 4. Scroll down and select “The 2015 Planning Database”
9
![Page 10: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/10.jpg)
Managing the PDB
![Page 11: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/11.jpg)
It’s a BIG dataset Block Group Level
220,354 block groups X 344 variables =
~75.8 Million cells
Tract Level 74,021 tracts X 566 variables =
~41.9 Million cells
![Page 12: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/12.jpg)
The Structure
Geography Identifiers • GIDBG (12 chars) = State (2 chars) + County (3 chars) + Tract (6 chars) + Block Group (1 char) • GIDTR (11 chars) = State (2 chars) + County (3 chars) + Tract (6 chars)
Demographic, Socioeconomic, and Housing data. • Order of variables is consistent. Census data first, followed by ACS estimates and ACS MOEs. • For example, Males_CEN_2010, Males_ACS_09_13, Males_ACSMOE_09_13
Census Operational data including Mail Return Rate and Low Response Score
Percentages and MOE Percentages. Listed in the same order as their respective estimate. • Variables identified with ‘pct_’ added to their variable name. • For example, pct_Males_CEN_2010, pct_Males_ACS_09_13, pct_Males_ACSMOE_09_13
![Page 13: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/13.jpg)
Low Response Score (Erdman and Bates slides)
13
![Page 14: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/14.jpg)
Low Response Score for Use in Survey and Census Planning and Analysis
Chandra Erdman and Nancy Bates U.S. Census Bureau
Disclaimer: The views expressed on statist ical issues are those of the authors only.
![Page 15: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/15.jpg)
Overview
1 The original Hard-to-Count (HTC) Score
The Census Kaggle Challenge
The Low Response Score (LRS)
2
3
Erdman & Bates (2014) Low Response Score (LRS)
![Page 16: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/16.jpg)
The Original HTC Score
Bruce et al. (2001); Bruce and Robinson (2003)
1 Renter occupied units
Unmarried
Vacant units Multi-unit structures
Below Poverty
Not high school graduate
2
3
4
5
6
7 Different housing unit 1 year ago
Public assistance
Unemployed
8
9
10 Crowed units
11 Linguistically isolated households
12 No phone service
Erdman & Bates (2014) Low Response Score (LRS)
![Page 17: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/17.jpg)
The Census Kaggle Challenge - 2012
“All you need is data and a question. Our data scientists will provide the answer.” – Kaggle.com
Data: 2012 Block-Group-Level Planning Database (PDB) Question: Which statistical model best predicts 2010 Census mail return rates? Product: Updated model-based “Hard-to-Count” Score
Erdman & Bates (2014) Low Response Score (LRS)
![Page 18: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/18.jpg)
The Census Kaggle Challenge (Cont.)
2009 America COMPETES Act Contest ran August 31 - November 1, 2012
244 teams and individual competitors
Software developer from MD won top prize
Erdman & Bates (2014) Low Response Score (LRS)
![Page 19: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/19.jpg)
Winning Model Predictors
When ranked by relative influence, 24/25 top predictors from PDB
(Rank)
Rel
ativ
e In
fluen
ce
2
0 10 20 30 40 50
1 3
4 ● (1) Renter
● (2) Ages 18−24
● (3) Female head of household, no husband
Erdman & Bates (2014) Low Response Score (LRS)
![Page 20: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/20.jpg)
Low Response Model (Block-Group)
Sig: * * * p < .001; * * .001 ≤ p < .01; * .01 ≤ p < .05 R-squared: 56.10%, n = 217,417
Erdman & Bates (2014) Low Response Score (LRS)
Coef Sig Coef Sig (Intercept) 10.29 *** Renter occupied units 1.08 *** Ages 18-24 0.64 *** Female head, no husband 0.58 *** Non-Hispanic White -0.77 *** Ages 65+ -1.21 *** Related child <6 0.46 *** Males 0.09 *** Married family households -0.12 *** Ages 25-44 -0.06 Vacant units 1.08 *** College graduates -0.32 *** Median household income 0.24 *** Ages 45-64 -0.08 * Persons per household 3.44 *** Moved in 2005-2009 0.09 *** Hispanic 0.41 *** Single unit structures -0.52 *** Population Density -0.40 *** Below poverty 0.11 *** Different HU 1 year ago -0.12 *** Ages 5-17 0.17 *** Black -0.04 ** Single person households -0.24 *** Not high school grad -0.06 *** Median house value 0.71 ***
![Page 21: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/21.jpg)
Distribution of the LRS
20 30 Low Response Score
Num
ber o
f Blo
ck G
roup
s
0 10 40 50
0 50
00
1000
0 15
000
2000
0 25
000
Erdman & Bates (2014) Low Response Score (LRS)
Rule of thumb…areas with LRS = >29 are hardest to count?
![Page 22: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/22.jpg)
![Page 23: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/23.jpg)
23
![Page 24: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/24.jpg)
LRS/PDB Example: Three HTC Blocks in DC
Columbia Heights: 43% Hispanic;
36% Other Language; 92% 10+ multi-
units; 64% non-family hhds; 85%
renters; 60% moved 5 years ; LRS=32
Erdman & Bates (2014) Low Response Score (LRS)
Anacostia: 98% Black; 46% below
poverty; 89% single unit homes; 15%
non-family hhds; 21% moved 5 years;
93% renters; LRS=38
Trinidad: 37% Ages 18-24;
59% Moved 5 years; 33%
Below poverty; 28% Vacant;
55% Black; 31% white; 87%
renters; LRS=37
![Page 25: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/25.jpg)
Considerations
Independent variable is mail response; 2020 Census will have an Internet response option
“Single Unattached Mobiles” (Bates and Mulry, 2011) 64.7 percent of American Community Survey self response by Internet (Baumgardner, 2013)
In January, 2013, ACS began asking about Internet connectivity
Erdman & Bates (2014) Low Response Score (LRS)
![Page 26: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/26.jpg)
Summary
New “hard to count” metric for tracts and block groups Winning model was complex but predictors in rank order of influence proved useful Accurate predictions with relatively few predictors
Useful for planning and targeted advertising LRS updated yearly to reflect changes Develop mapping app populated with PDB and LRS?
Erdman & Bates (2014) Low Response Score (LRS)
![Page 27: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/27.jpg)
Examples Using the PDB
27
![Page 28: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/28.jpg)
Area Demographics 619,371 people live in 179 tracts in DC
DC* United States*
Male to female ratio 0.90 0.97
Population under 5 years old 5.9% 6.4%
Population that identifies as Hispanic 9.6% 16.6%
Population that moved within the past year 19.4% 15.1%
Population that was not born in the US 13.8% 12.9%
28
*ACS 5 year 2009-2013
![Page 29: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/29.jpg)
Using Excel to Analyze Demographics
29
I used the Excel function SUM() on all DC tracts to find the total Census population
![Page 30: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/30.jpg)
2016 Census Test Harris County Texas Demographics
484,358 people live in 292 block groups in the test site
Houston* United States*
Households where no one over 14 speaks English “very well” 14.8% 4.6%
Population 18-24 years old 9.4% 10.0%
Renter Occupied Units 60.9% 35.1%
Population 25 and over, with less than a HS diploma 19.1% 13.9%
30
*ACS 5 year 2009-2013 Estimate
![Page 31: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/31.jpg)
31
![Page 32: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/32.jpg)
Linguistic Isolation What if you want to identify areas that may
need support for a language other than English? Find block groups in the area that have a high
percentage of housing units where no one over the age of 14 speaks English “very well” What language is spoken in these tracts?
32
![Page 33: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/33.jpg)
Linguistically Isolated BGs in 2016 Census Harris TX Test Site
Rank BG No one speaks English “very
well” Spanish Asian/Pacific
Islander Other
1 4327012 81.4% (14.3)
81.4% (14.0)
0% (2.1)
0% (2.1)
2 4330012 77.2% (13.4)
73.4% (13.5)
3.8% (4.1)
0% (2.3)
3 4327011 72.5% (11.1)
72.5% (10.9)
0% (1.6)
0% (1.6)
4 4335012 69.3% (10.9)
66.1% (10.7)
0% (1.7)
3.2% (4.8)
5 5214001 69.3% (21.1)
69.3% (20.6)
0% (3.7)
0% (3.7)
33
![Page 34: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/34.jpg)
JSM Govt Section Data Challenge
Tailoring Outreach to Boost Mail Self-Response in Geographic Areas with Similar Low Response Scores — Darryl Creel
Exploring the Census Bureau's 2014 Planning Database Using Topological Data Analysis — Robert Baskin
Informing Natural Disaster Response with Census Data — Jonathan Auerbach ; Christopher Eshleman, New York City Council
Optimizing Survey Cost-Error Tradeoffs: A Multiple Imputation Strategy Using the Census Planning Database — Shin-Jung Lee, University of Michigan
34
![Page 35: 2014 Planning Database (PDB)](https://reader033.vdocuments.mx/reader033/viewer/2022051302/5895b7371a28ab195b8be27b/html5/thumbnails/35.jpg)
Important Note Why are there duplicates tracts and BG in the
PDB? Short answer: Changes in geography since 2010
35