2008 nchs data users’ conference omni shoreham hotel washington, dc wednesday, august 13, 2008
TRANSCRIPT
2008 NCHS Data Users’
ConferenceOmni Shoreham Hotel
Washington, DC
Wednesday, August 13, 2008
Session 43:
The Research Data Center:
Data Enclave of the NCHS
Session Coordinator and PresenterDeborah Rose, Ph.D.
Speakers and topics (updated list)
Introduction to the RDC:
• Overview of the Research Data Center – What it is and what it does - Deborah Rose, Ph.D., M.P.H.
• Types of data available - Stephanie Robinson, B.A.
Speakers and topics (updated list, continued)
Examples of RDC research collaborations:• Emergency medicine research and the
RDC - Julius Cuong Pham, M.D., Ph.D
• Assessing health and health care in the District of Columbia - Carole Roan Gresenz, PhD
• Combining contextual variables with data from NHANES III - Chloe Bird, Ph.D.
What is the RDC?Location A suite of offices in Hyattsville MarylandStaff Project managers who are experienced analystsSecurity Keypad access office suite, stand- alone computers Data Public and confidential health and other information, combined and customized for your projectAccess Onsite, remote, Census RDC
Start of the RDC
• Modeled after the Census Bureau Data Research Centers
• Opened in 1998• Policies were developed to assure
access and confidentiality
Two contradictory mandates
• Wide dissemination of the data - The Public Health Service Act of 1956 requires the collection and wide dissemination of data.
• Maintenance of confidentiality - the NCHS 308(d) Confidentiality Statute requires that the information collected may not be released if the establishment or person supplying the information is identifiable
Resolving the contradiction
• Summary tables of aggregate data are published (on paper or on the web)
• Public use datasets are released with person or institution level records
• Records do not include individual identifiers
• Variables that might allow record identification are suppressed
• Values based on small samples are suppressed
When do you need the RDC?
Data needs and availability• You have a research project or policy
objective best served by analyzing representative, federally collected health data
• Public use data does not meet all your needs
• NCHS has, or can get, the data of interest
When do you need the RDC?
(continued)Analytic skills and computer access• You or your staff have the skills to
analyze individual level data using a standard statistical package
• You can come to the NCHS RDC, a Census RDC, or have a secure email account to access our remote system.
When do you need confidential variables?
• Confidential information is directly related to your main research question.
• You need to link link two or more datasets, using small area geographic identifiers (such as state, county or census tract) that are not publicly available.
• You need to make a subset of the population using selection criteria from a confidential variable such as exact age, date of interview, small race/ethnic group.
Types of Data
• NCHS Supplied: Confidential variables from the vital statistics system, any of the NCHS data collection systems, or files linked between systems
• User supplied: Public use or other data collected by other agencies, or compiled by the user
• See next presentation for more detail
Major Steps
• See the website for the latest requirements• Develop and submit a proposal• We review it and accept, reject or ask for
revisions• You sign the confidentiality agreements• You send us the public use files• We merge the public and confidential data• We send you an invoice for the setup and
usage costs
Major Steps(continued)
• You run your analyses• We review the output for disclosure• You publish• Please send us a citation and copy of
your published or reported work!
Components of the proposal
• Contact information• Key study questions/Public health
benefits• Year, data system and dataset(s) • Lists of public use and confidential
variables• Why publicly available data are
insufficient• Analysis/statistical methods/software• Sample output table shells
NCHS User Fees
File construction and setup• Mortality files = $250 per day• All other files = $500 per day
NCHS User Fees(continued)
Access and Analysis On site• $200 per day (2-10 days)
Remote• NSFG-CDF = $500/year• All other files = $500/month• Each added survey cycle = $250/month
ANDRE: ANlytical Data Research by
Email• Completely automated system• Operates round the clock
without any human intervention• Registered subscribers only
– Proposals already reviewed and approved– Confidentiality agreements have been
signed
• Unlimited Access during the subscription period
How ANDRE Works
• A registered subscriber sends an email to ANDRE with a SAS or SUDAAN program in an attachment
• ANDRE’s lead server authenticates the user through password challenge and email
• Researchers never see data but run their programs against a data set prepared to their specifications by RDC staff
NCHS RDC Usage Statistics
Average no. of projects 1998-2003 = 1.5/month
Average no. of projects 2004-2006 = 2.5/month
Average no. of proposals 2007 = 10/monthCurrent no. of active projects, 2007 = 146Average no. of daily remote users 2007 = 18
Average no. of proposals 2008 = 7/monthCurrent no. of active projects, 2008 > 200Average no. of daily remote users 2008 = 30
Visit the NCHS RDC website at:
http://www.cdc.gov/nchs/r&d/rdc.htm
For more information the NCHS RDC website at:
www.cdc.gov/nchs/r&d/rdc.htm