public use microdata samples using pdq explore software grace york university of michigan library...
TRANSCRIPT
Public Use Microdata Public Use Microdata SamplesSamples
Using PDQ Explore Software
Grace YorkUniversity of Michigan Library
May 2004
2000 Census Data 2000 Census Data TabulationsTabulations
• Summary Files 1-4, Equal Employment Opportunity, School District Data, and Work Flow data are TABULATED data
• American Factfinder EXTRACTS the tabulated data
Public Use Microdata Public Use Microdata SamplesSamples
• Copies of the original questionnaires with identifying information edited out
• Create your own cross tabulations of census data
Typical PUMS Typical PUMS QuestionsQuestions
• Single years of age by sex for teachers in Michigan (e.g. when will they retire?)
• Race of those with Arab ancestry (no, they are not all white)
• Demographic characteristics of immigrants from Senegal (age, sex, education, occupation, income, citizenship for a social survey)
• Age, race and sex of automotive industry employees (campaign for organ donations)
PUMS Software PUMS Software ProgramsPrograms
• FTP data from Census Bureau (and manipulate with SAS or SPSS)
http://www.census.gov/Press-Release/www/2003/PUMS5.html
• Census Bureau CD-ROMS (Beyond 20/20 software)
http://www.census.gov/mp/www/Tempcat/PUMS.html
• SDA Software for Michigan (UMich Only)http://nds.umdl.umich.edu/n/nds/
• PDQ Explorehttp://www.pdq.com
PDQ Explore SoftwarePDQ Explore Software
• Easy interface to– Public Use Microdata Samples, 1 and
5%, 1980-2000– IPUMS, edited PUMS, 1850-1880, 1900-
1920, 1940-1990– Current Population Survey, 1991+– Mortality Schedules
• Permits users to tabulate their own variables
Access to PDQAccess to PDQ
• Librarians may request free Ids, passwords, and software from PDQ
• Send e-mail to [email protected]– You are a librarian who talked to Grace York– Requesting ID and password for using PDQ
Explore – Want to download software for the PDQ
Toolbox, Expert Edition
http://www.pdq.com
SoftwareSoftware
• Download the software per instructions to your hard drive
• To begin searching, open the icon on your desktop
Before Beginning …Before Beginning …
Choose FileChoose File
Two PUMS files – 1% and 5% sample
• 1% has data for the nation, states, MSAs and super-Pumas (areas of 400,000)
• 5% has data for the nation, states, MSAs and Pumas (areas of 100,000)
Before Beginning…Before Beginning…
Define the data you want in terms of a spreadsheet. The longer part should be defined as rows rather than columns.
I want single years of age by sex for all Vietnam-era veterans in the United States
Universe = Vietnam-era veterans in the U.S.Column=sex (not very wide)Row=single years of age (could be long)
Before Beginning…Before Beginning…
Consult Chapter 7 of the PUMS codebook if you want to check the possible variables and the appendices for place/language/ancestry and occupation codes
http://www.census.gov/prod/cen2000/doc/pums.pdf
Chapter 7 is also available on the University of Michigan web site at:
http://www.lib.umich.edu/govdocs/census2/pums2000/pums7.pdf
Before Beginning…Before Beginning…
Housing RecordAll geographic codes (state, MSA, PUMA)All housing recordsSome population records
Population RecordAll population variablesOk to combine with geographic codes in housingAsk for help for other population/housing combinations at: [email protected]
Before Beginning…Before Beginning…
Variable Codes for the Questionin the Technical Documentation Data Dictionary
AGE Single Years of Age
SEX Male or Female
VPS5 Veteran’s Period of Service 5: On active duty during
the Vietnam Era (Aug. 1964 to Apr. 1975)
http://www.lib.umich.edu/govdocs/census2/pums2000/pums7.pdf
Logging OnLogging On
Enter the subscriber name and password that you were given by the PDQ staff
Logging OnLogging On
Press OK to close the message of the day
Defining WorkspaceDefining Workspace
• To conduct a new search, create a new workspace
• Press Finish or return twice
Defining WorkspaceDefining Workspace
Name your file on your hard drive and save.
Defining WorkspaceDefining Workspace
At the next screen, use the top menu to choose Workspace; then Add a Data Set
Defining WorkspaceDefining Workspace
Browse data sets; highlight ipums, pums, cps, or mortality file; Open
Defining VariablesDefining Variables
• Once you choose a data set, its codebook will open up• Click on the plus button to get a list of variables, their
alphabetic symbols, and any numeric values
Defining VariablesDefining Variables
• Determine the alphanumeric variables you want (e.g. Vietnam-era veteran: yes is VPS5=1)• Use Top Menu to Choose Query/Setup New Expert Query(Access the codebook later through a tab on the desktop
toolbar)
Expert Query FormExpert Query Form
1. Make sure you have the correct data set2. Determine if you want a tabulation (counts or
numbers)3. Name your file
Expert Query FormExpert Query Form
Enter the code for UNIVERSE (what you’re counting) in the Universe box (e.g. vps5=1 are Vietnam-era veterans for the entire U.S.)
Expert Query FormExpert Query Form• Enter the code for the variables in the ROW box (age = single years of age; age/5 would be five year age
groups) • Enter the code for the variables in the COLUMN box (e.g.
sex)• Press RESULTS to run the query
Search ResultsSearch Results
Search results appear in spreadsheet format
Saving ResultsSaving Results
• Click on File/Export Query Results• You can save as CSV , tab delimited and several other
formats. CSV (WYSIWIG) recommended for use with Excel• Use SETUP button to return to query or icon at bottom to
review the codebook
Geographic CodesGeographic Codes
• Geographic codes are found in the Housing documentation
• Limit files to Michigan with the code state=26• Click on Query/New Expert Query to continue
Narrowing the UniverseNarrowing the Universe
Narrow Narrow the universe by using the universe by using & newcode& newcode (e.g. vps5=1 & state=26)(e.g. vps5=1 & state=26)
Logical Operators in Logical Operators in PDQPDQ
http://www.lib.umich.edu/govdocs/census2/pdqop.phttp://www.lib.umich.edu/govdocs/census2/pdqop.pdfdf
& & is one of numerous operators used in PDQis one of numerous operators used in PDQ
Operator Name Example/Comment X:a..b range age:15..44 unary + plus sex=+1 (never needed) unary - minus income4<=-1000 * multiply 73*income1/100 / divide rhhinc/persons % modulo subsample%10 + add income1+income2 - subtract rhhinc-rearning < less than age<65 > greater than age>64 <= less than or equal age<=65 >= greater than or equal age>=65 = or == equal age=23 != or <> not equal income!=0 & or && and race=2 & looking=1 ^ exclusive or bit-wise--use with caution | or || or age<18 | age>=65
Altering the Spreadsheet Altering the Spreadsheet TabulationsTabulations
Once you have a spreadsheet, click on Options to create totals or percentages for tables or columns
Adding More Adding More ParametersParameters
Expand the table detail by repeating the row and column data for another parameter (e.g. race) as shown in Dimension 3
Altering Spreadsheet Altering Spreadsheet AppearanceAppearance
• The default shows separate tables for each of the values in the third dimension (e.g. separate spreadsheets for white and black)
• Change Axis3 tab to FOREACH everything on same spreadsheet
Calculating Means or Calculating Means or AveragesAverages
• Calculate averages by changing the query type to summary statistics (e.g. mean or average) at the top
• Fill in the new Describe Expression box at the bottom with a variable code (e.g. age, income)
Complex TableComplex TableMean income of white male Vietnam-era veterans in
Michigan by age, whether or not they have earningsYou can respecify only veterans with earnings
Altering Mean IncomeAltering Mean Income
Add & incws > 0 to universe to count only Vietnam-eraveterans who are earning more than $0
Complex TableComplex TableMean income is higher when data limited to wage-
earning veterans
Small Area GeographySmall Area Geography
• Data from the PUMS 5% file is available for states, metropolitan areas, and Public Use Microdata Areas (PUMAS) of 100,000
• You can identify a PUMA or group of PUMAs using– Maps in American Factfinder (
http://factfinder.census.gov/)– PDF maps on the Census Bureau web site
(http://www.census.gov/geo/www/maps/puma5pct.htm)– Mable/Geocorr Search Engine
(http://mcdc2.missouri.edu/websas/geocorr2k.html)
Small Area GeographySmall Area Geography
This map shows Detroit as PUMAs 3701-3708
PUMA Codes for PUMA Codes for MichiganMichigan
Ann Arbor 3200Detroit 3701-3708Flint 2200Grand Rapids 1300Lansing 1800
PUMA to Placehttp://www.lib.umich.edu/govdocs/census2/pumapl00.txt
Place to PUMAhttp://www.lib.umich.edu/govdocs/census2/plpuma00.txt
Codebook and PUMASCodebook and PUMAS
The Explore Codebook shows PUMA5 as The Explore Codebook shows PUMA5 as term for 5% PUMA boundariesterm for 5% PUMA boundaries
Small Area Geography Small Area Geography and Rangesand Ranges
When creating data sets for PUMAS, be sure to include the correct state as the universe (e.g. state=26)
Small Area Geography Small Area Geography and Rangesand Ranges
Puma5: 3701..3708 will list the data for each individual area
Small Area Geography Small Area Geography and Rangesand Ranges
Search result for each individual PUMA
Small Area Geography for Small Area Geography for RangesRanges
To get the total for the area, list it in the universe as puma5 >3700 & puma5 <3709 & state=26
Small Area Geography for Small Area Geography for RangesRanges
To get a listing of single years of age between 65 and 85, list column as age: 65..85
Calculating TotalsCalculating Totals
• To calculate the most spoken languages by 65-85 year olds as a group
• Click on Options/Total Options/Row
Complex ResultComplex Result
Spanish and Polish are two most popular Spanish and Polish are two most popular languages spoken by seniors 65-85 in languages spoken by seniors 65-85 in DetroitDetroit
Access to PDQAccess to PDQ
• Librarians may request free Ids, passwords, and software from PDQ
• Send e-mail to [email protected]– You are a librarian who talked to Grace York– Requesting ID and password for using PDQ
Explore – Want to download software for the PDQ
Toolbox, Expert Edition
http://www.pdq.com
Contacts for Research Contacts for Research AssistanceAssistance
Initial QueriesInitial Queries
Grace York, Documents Center, 203 HatcherGrace York, Documents Center, 203 Hatcher
[email protected] or [email protected] or 936-2378
JoAnn Dionne, Numeric and Spatial Data Services, JoAnn Dionne, Numeric and Spatial Data Services, 825 Hatcher, [email protected], 825 Hatcher, [email protected],
763-9408763-9408
Complex Data SetsComplex Data Sets
Lisa Neidert, Population Studies Center, 426 Lisa Neidert, Population Studies Center, 426 Thompson, [email protected], 763-2163Thompson, [email protected], 763-2163
PDQ Staff, 310 Depot Street, Suite C, Ann Arbor PDQ Staff, 310 Depot Street, Suite C, Ann Arbor 48104, [email protected], [email protected]