presentation at the research conference on research integrity niagara falls, ny may 16, 2009
DESCRIPTION
Research Data in The Social Sciences: How Much is Being Shared? Amy Pienta Myron Gutmann Jared Lyle ICPSR, University of Michigan. Presentation at the Research Conference on Research Integrity Niagara Falls, NY May 16, 2009. Types of Social Science Data. MAJOR SOCIAL SCIENCE TOPICS - PowerPoint PPT PresentationTRANSCRIPT
Research Data in The Social Sciences: How Much is Being Shared?
Amy PientaMyron GutmannJared Lyle
ICPSR, University of Michigan
Presentation at the Research
Conference on Research Integrity
Niagara Falls, NY
May 16, 2009
Types of Social Science DataMAJOR SOCIAL SCIENCE TOPICS• Social - class, crime, social movements,
race relations, culture, folklore, family, aging• Economic - wealth, prosperity, labor,
business• Psychological - cognition, attitudes,
stereotypes• Politics - justice, democracy, public policy,
public administration, international conflict
TYPES OF DATA• Surveys, Opinion Polls, Structured
Interviews, Experiments, GIS (map)• Administrative & Historical Records • Video, Audio, Transcripts, Text• Web sites, Email, Blogs
How Can We Think About Data Sharing?• Making one’s research data available for others to
analyze and/or reanalyze
• Placing one’s data in the public domain Data archive that has a explicit mission to preserve and
disseminate data to a wide audience
Value of Data Sharing in the Social Sciences+
Replication Surveys are often more comprehensive than any one
researcher’s needs/time Improve other data collections and measurement Reduces costs by avoiding duplicate data collection
efforts Research training Data ownership larger than the PI
Many Avenues for Sharing Data in the Social Sciences• Broad-based social science data archives • National data archives (outside the US)• Thematic “boutique” archives• Institutional repositories• Journal-based archives• Individual/departmental websites
Why are data not shared?
• Preparing data and documentation can be enormously time consuming
• Need to protect the confidentiality of respondents• Fear of getting “scooped”• Lack of rewards for sharing• Limited resources for data preparation
NSF Data Sharing Policy
National Science Foundation Important Notice 106 (April 17, 1989) states: "[NSF] expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections, and other supporting materials created or gathered in the course of the research. It also encourages awardees to share software and inventions or otherwise act to make such items or products derived from them widely useful and usable."
NIH Data Sharing Policy
The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers. Starting with the October 1, 2003 receipt date, investigators submitting an NIH application seeking $500,000 or more in direct costs in any single year are expected to include a plan for data sharing or state why data sharing is not possible.
Goals
To identify the “universe” of social science data that have been collected
To know how much social science data is “at risk” of being lost or has been lost (versus that which is available, preserved)
To understand the value of sharing and/or data archiving
LEADS Database at ICPSR
• NICHD funding – PI Survey about Disclosure Risks
• Library of Congress funding – Identification and Appraisal of “at risk” Social Science Data
• ORI RRI funding (NLM) – Creating a research database
What is LEADS?
A database of records containing information about thousands of scientific studies that may have produced social science data
The database contains:
Descriptive information about scientific studies we identify.
Information used to determine “fit” and “value” of a scientific study
Value-added information from bibliometric analysis, PI surveys, constructed variables
LEADS Screening Criteria
• Social science and/or behavioral science• Original or primary data collection proposed,
including assembling a database from existing (archival) sources
NSF Grant Awards in LEADS
LEADS contains 17,194 awards made by NSF
LEADS spans 30 years of NSF awards - 1976 to 2005
0
100
200
300
400
500
600
700
800
900
1970 1975 1980 1985 1990 1995 2000 2005 2010
Start Year
NSF Grants Reviewed by ICPSR
NIH Grant Awards in LEADS
• NICHD, NIA, NIMH, NINR, AHRQ, NIAAA, NIDA, Clinical Center, NIDCD, FIC, NCI, NHLBI, NIDDK (1972+)
• 172,196 - total # awards screened
LEADS Database at ICPSR
# Records Reviewed # Social Science Data
Recent NSF (1976+) 17,194 2,537
Historic NSF (Pre-1976) 96,403 4,019
NIH (1972+) 172,196 6,381
285,793 12,937
Results: Total and By Funding Agency
3.8
2.4
6.6
0
1
2
3
4
5
6
7
Total(n=7,040)
NIH(n=4,719)
NSF(n=2,321)
Total (n=7,040)NIH (n=4,719)NSF (n=2,321)
Results: By Award Year
54.1
2.5
00.5
11.5
22.5
33.5
44.5
5
1985-1990(n=1,989)
1991-1996(n=2,368)
1997-2001(n=2,683)
1985-1990 (n=1,989)1991-1996 (n=2,368)1997-2001 (n=2,683)
Results: By Gender of PI
1.8
5.1
0
1
2
3
4
5
6
Women (n=2,820) Men (n=4,178)
Women (n=2,820)Men (n=4,178)
14.2%
58.7%
25.7%
0
10
20
30
40
50
60
Data AreArchived
Has Copyof Data
Data AreLost
NSF & NIH Funded Data Collections: Where are they
today? N=1,544
LEADS: How Data Are Lost
Data Intentionally Discarded“I generally keep data for…10 years beyond the last
time I do something with them.”
“The material…was considered sensitive data. Institutional review boards.. required us to promise to destroy the data after a certain period of time...”
“As I retired…I simply didn’t have the room to store these data sets at my house.”
LEADS: How Data Are Lost
Unintentionally Lost“Some data were collected, but the data file
was lost in a technical malfunction.”
“The data from the studies were on punched cards that were destroyed in a flood in the department in the early 80s.”